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Preface 


The  purpose  of  this  study  was  to  investigate  the  application  of  a  conservative  synchro¬ 
nization  paradigm  for  parallel  discrete  event  simulations;  specifically,  to  solve  the  classical 
pool  balls  simulation.  This  thesis  effort  has  demonstrated  that  the  conservative  approach 
can  produce  comparable  performance  to  an  optimistic  approach  by  comparing  the  results 
of  the  pool  balls  simulations  produced  at  Cal  Tech  with  those  of  this  thesis. 

This  thesis  effort  also  demonstrates  the  viability  of  spatially  partitioning  a  simulation 
model  in  a  conservative  environment.  Several  design  approaches  were  analyzed  and  their 
respective  advantages  and  disadvantages  were  derived.  Two  conservative  forniulations 
for  a  minimum  safe  time  were  developed  both  of  which  maintain  system  correctness  and 
simulation  progress.  The  tradeoff  between  them  is  shown  to  be  scalability  versus  e.xecution 
efficiency.  The  more  efficient,  less  scalable  one,  was  chosen  for  empirical  study. 

In  the  development  of  the  simulation  software,  I  had  to  acquaint  myself  with  the  C 
programming  language.  Were  it  not  for  the  invaluable  help  of  Lt  Kevin  Hanrahan,  I  would 
never  have  developed  a  working  program.  I  would  like  to  thank  Professor  Gary  Lamont 
for  his  ceaseless  ejicouragement  and  brilliant  inspiration.  Without  his  help,  I  would  have 
pursued  many  wrong  approaches  in  system  design,  theory  and  writing.  He  was  particularly 
useful  in  the  application  of  predicate  logic  for  developing  the  theorems  and  proofs  piesentcd 
in  this  thesis. 

1  am  deeply  indebted  to  my  wife  Kelly  and  my  daughter  Rachel  for  their  encourage¬ 
ment,  support,  patience  and  love.  The  time  required  to  fulfill  the  requirements  of  Master 
of  .Science  of  Computer  Engineering  lias  been  e,\traordinary.  Without  the  direct  support  of 
my  family,  I  could  not  have  maintained  the  e.xacting  pace  demanded  of  my  studies.  Thank 
you. 


Robert  S.  Moser 


Table  of  Contents 


Page 

Preface .  ii 

Table  of  Contents  .  iii 

List  of  Figures .  iv 

List  of  Tables .  v 

Abstract .  vi 

I.  Introduction .  ] 

1.1  Background . • .  2 

1.2  Thesis  Statement .  3 

1.3  Scope .  3 

1. d  Research  Objectives .  A 

1.5  Assumptions .  d 

1.6  General  Approach .  d 

1.7  Summary .  5 

II.  Issues  in  Distributed  Discrete  Event  Simulation .  6 

2.1  Introduction  .  6 

2.2  Motivation  for  Parallel  Computing .  6 

2.3  Flynn's  Ta.xonomy .  6 

2. d  Tightly  Coupled  VS  Loosely  Coupled  MIMD  Architectures  ...  7 

2.5  Ilypercube  MIMD  Architectures .  8 

2.6  Performance  Measures .  9 

2.7  Taxonomy  of  Simulations  and  Simulation  Models .  9 


III 


Page 

2.8  Distributed  Discrete  Event  Simulation  Paradigms .  10 

2.9  The  Theories  of  Chandy  and  Misra .  13 

2.10  Event  Modeling .  15 

2.11  Spatial  Partitioning .  15 

III.  Requirements  Analysis .  IS 

3.1  Introduction  .  18 

3.2  Requirements .  19 

3.3  Developing  the  Equations  of  Motion .  20 

3.3.1  Event  Calculations  for  Collisions  with  Cushions .  21 

3.3.2  Calculations  for  Partition  and  E.\it  Events  ^  .  23 

3.3.3  Event  Calculations  for  Collisions  Between  Pool  Balls  .  .  24 

3.4  Simulation  Environment .  28 

3.4.1  Simulation  Driver .  28 

3.4.2  The  Ne.\t  Event  Queue  Object .  29 

3.4.3  The  Event  Object .  29 

IV.  Software  Design .  31 

4.1  Introduction  .  31 

4.2  Design  of  a  Sequential  Simulation  witliout  Spatial  Partitioning  .  31 

4.2.1  Design  of  the  Ball  Object .  32 

4.2.2  Design  of  the  Ball  Object  Manager .  32 

4.2.3  Design  of  the  Table  Sector  Object .  34 

4.2.4  Design  of  the  Table  Sector  Manager .  35 

4.2.5  De.sign  of  the  Random  Number  Generator  Object ....  36 

4.2.6  Design  of  the  Eventllandler  Object .  37 

4.2.7  Event  Definitions .  37 

4.2.8  Algorithm  Design  .  37 


IV 


Page 

4.2.9  Design  of  the  Queue  Structures .  40 

4.2.10  Version  1  Structure  Chart .  40 

4.2.11  Command  Line  Arguments .  40 

4.3  Design  of  a  Spatially  Partitioned  Sequential  Simulation .  44 

4.3.1  Changes  to  the  Ball  Object .  44 

4.3.2  Changes  to  the  Table  Object .  44 

4.3.3  Changes  to  Event  Definitions .  44 

4.3.4  Changes  to  the  Simulation  Algorithm .  44 

4.4  Design  of  a  Parallel  Simulation . 47 

4.4.1  Changes  to  the  Ball  Object .  47 

4.4.2  Changes  to  the  Table  Object .  47 

4.4.3  Changes  to  the  Candidate  Queue  Structure .  47 

4.4.4  Changes  to  the  Next  Event  Queue  Structure  .  47 

4.4.5  Changes  to  the  Simulation  Algorithm .  48 

V.  Parallel  Simulation  Design  and  Implementation . . .  53 

5.1  Introduction  . 53 

5.2  Design  of  a  Parallel  Simulation  Model .  53 

5.2.1  Modeling  Pool  Balls  as  Transient  Entities .  53 

5.2.2  Modeling  the  Pool  Table  as  Multiple  Resident  Entities  .  54 

5.3  Developing  the  Minimum  Safe  Time  Calculation .  55 

5.3.1  Additional  Properties  of  A/ SITlf.'/)  .  CO 

5.3.2  Analysis  of  61 

5.4  Dev 'loping  an  .Mternativc  MST  Calculation .  61 

5.5  Developing  a  Second  Minimum  Safe  Time  Calculation .  62 

5.6  Dr  'eloping  a  Third  MST  Calculation  .  64 

5.7  .Analyzing  Alternative  MST  Calculations  .  67 

5.7.1  An  Example  using  Both  MST’s .  67 


Page 

5.8  Selecting  ah  MST  Formulation .  70 

5.5.1  Implementing  the  Minimum  Safe  Time .  70 

5.8.2  Implementing  Wieland’s  Data  Replication  Strategy  ...  71 

5.9  Summary .  72 

VI.  Test  Results .  73 

6.1  Introduction  .  73 

6.2  Verification  and  Validation .  73 

6.3  Simulation  Performance  Test  Plan .  75 

6.3.1  Defining  the  Variables  of  Interest .  75 

6.3.2  Defining  the  Constants  .  76 

6.3.3  The  Test  Plan  .  76 

6.4  Test  Results .  77 

6.4‘.l  Analysis  of  Pool  Table  Sectoring .  77 

6.5  Comparison  of  AFIT  and  Cal  Tech  Simulation  ’'esults .  85 

VH.  Conclusions  and  Recommendations .  87 

7.1  Introduction  .  87 

7.2  Impact  of  Computa'ional  Load .  87 

7.3  Impact  of  Spatial  Partitioning .  87 

7.4  Determining  the  Optimal  Number  of  Sectors .  87 

7.5  Impact  of  the  Conservative  Paradigm  upon  Scalability .  88 

7.6  Conservative  Versus  Optimistic  Paradigms  .  89 

7.7  Recommendations  for  Future  Study  .  89 

.Appendi.'c  A.  Software  Listings .  91 

A.l  Software  Files .  91 

.*\.1.1  Functional  Description .  91 

i\.2  Compiling  Instructions .  94 


VI 


VJI 


List  of  Figxires 

Figure  Page 

1.  A  Partitioned  Pool  Table  with  a  Future  Collision .  T1 

2.  A  Queuing  Network  that  can  Deadlock .  14 

3'.  Wieland’s  Grid  to  Grid  Proximity  Detection .  16 

4.  Layout  of  a  Pool  Table  on  an  X/Y  Axis  System  .  22 

5.  Two  Balls  Colliding  at  the  Point  of  Impact .  24 

6.  Translating  an  X/Y  Axis  System .  27 

7.  Data  Structure  for  Storing  Pool  Balls .  34 

8.  Version  1  Structure  Chart .  43 

9.  Level  1  Data  Flow  Diagram .  50 

10.  Level  2  Data  Flow  Diagram  for  Process  2.0 .  51 

11.  Level  2  Data  Flow  Diagram  for  Process  4.0 .  52 

12.  A  Four  Node,  Four  Sector  Process  Graph .  68 

13.  Performance  Curves  for  10  Balls .  78 

14.  Performance  Curves  for  20  Balls .  78 

15.  Performance  Curves  for  30  Balls .  79 

16.  Performance  Curves  for  40  Balls .  79 

17.  Performance  Curves  for  50  Balls .  SO 

IS.  Performance  Curves  for  100  Balls  .  SO 

19.  Performance  Curves  for  200  Balls  .  SJ 

20.  Speedup  Curvc.s  as  a  Funtion  of  Load  Factor .  S2 

21.  Speedup  Curves  a.s  a  Function  of  Cube  Size .  S2 

22.  Efficiency  Curves  as  a  Function  of  Load  Factor .  S3 

23.  Efficiency  Curves  as  a  Function  of  Cube  Size .  S3 

24.  Density  Curves  as  a  Function  of  Load  Factor .  S4 

25.  AFIT  Speedup  Curves  for  120  k  160  Pool  Balls .  S6 

26.  Curve  Fitting  to  the  Optimal  Sector  Data .  SS 


viii 


List  of  Tables 


Table  Page 

1.  Order  of  Analysis  for  the  Ball  Manager  Data  Structure .  33 

2.  Command  Line  Arguments .  41 


AFIT/GCS/ENG /GCE91D-15 


Abstract 

This  study  investigated  the  application  of  a  conservative  synchronization  paradigm 
to  the  classical,  distributed  pool  balls  simulation  executed  on  an  eight  node,  Intel  iPSC/2 
hypercube.  Wieland’s  concept  ofspatial  partitioning  and  limited  data  replication  was  used. 
Analysis  has  shown  that  100%  parallelization  of  e.xecution  is  possible  in  a  conservative  en¬ 
vironment  via  assignment  of  multiple  sectors  to  nodes.  Two  conservative  formulations  for 
minimum  safe  time  were  derived.  .4  tradeoff  exists  between  scalability  and  efficiency.  Opti¬ 
mum  sectoring  prediction  has  been  shown  possible  through  application  of  linear  regression 
techniques.  The  results  of  this  research  reveal  that  a  conservative  approach  to  distributed, 
discrete  event  simulations  can  achieve  significant  speedup. 
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A  Spatially  Partitioned 
Parallel  Simulation  of 
Colliding  Objects 


I.  Introd^ictioii 

This  thesis  investigates  the  application  of  a  conservative  synchronization  paradigm 
for  the  execution  of  parallel  discrete  event  simulations  implementing  spatially  paititionable 
models.  The  methodology  developed  for  this  thesis  is  of  particular  interest  to  operations 
researchers  and  simulation  system  designers  because  a  parallel  processor  having  jV  nodes 
can  potentially  execute  a  distributed  simulation  up  to  N  times  faster  than  a  single  pro¬ 
cessor  (21:.315).  The  realization  of  this  potential  increase  in  performance  is  the  primary 
motivation  to  distribute  simulations  over  many  processors.  Furthermore,  a  distributed 
approach  may  be  the  only  practicle  or  possible  solution  to  some  large,  complex  models. 
For  example,  Misra  has  investigated  a  sequential  simulation  of  a  complex  telephone  switch. 
Misra  assumed  that  the  switch  can  generate  about  100  internal  messages  while  completing 
a  local  telei)hone  call  and  that  100  switches  per  .second  can  be  accommodated  by  a  complex 
switch.  A  sequential  simulation  simulating  15  minutes  of  real  time  will  generate  nearly 
10  million  messages  requiring  several  hours  of  simulation  time  on  a  very  fast  uniprocessor 
(IS). 

This  thesis  develops  a  general  methodology  from  which  many  systems  may  be  mod¬ 
eled  in  a  distributed  manner.  This  methodology  is  developed  from  the  investigation  of  a 
specific  distributed  simulation  in  which  pool  balls  move  about  on  a  pool  table  and  collide 
with  one  another  and  with  the  pool  table  with  perfect  elasticity.  There  are  several  reasons 
why  this  simulation  has  been  chosen  for  analysis. 

1.  The  model  is  simple  to  comprehend,  thereby  emphasizing  the  process  of  developing 
a  distributed  simulation  and  not  the  simulation  itself. 

2.  Basic  simulation  parameters  such  as  computational  loading,  number  of  processors, 
number  of  logical  processes,  number  of  simulated  objects,  etc,  are  easily  varied  for 
performance  measurement  and  analysis. 
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3.  The  pool  balls  simulation  concept  has  become  a  classical  simulation  problem  hav¬ 
ing  been  developed  and  conceived  at  Cal  Tech  under  the  title  ‘Colliding  Pucks’  (13) 
and  benchmarked  at  the  Jet  Propulsion  Laboratory  under  the  title  ‘Pool  Balls  Bench¬ 
mark’  (4, 3).  The  pool  balls  simulation  concept  has  also  been  studied  using  a  modified 
version  of  the  Time  Warp  operating  system  (17)  and  using  a  time  driven  simulation 
approach  (7). 

4.  The  published  benchmark  results  from  the  Jet  Propulsion  Lab  allow  some  perfor¬ 
mance  comparisons  to  be  made  between  the  conservative  and  optimistic  paradigm 
implementations  for  the  pool  balls  simulation. 

LI  Backgrounrl 

According  to  Banks  and  Carson,  ‘a  simulation  is  the  imitation  of  the  operation  of  a. 
real-world  process  or  system  over  time  (2:2).’  A  model  is  a  representation  of  a  real-world 
system  and  takes  the  form  of  a  set  of  assumptions  concerning  the  behavior  of  the  system. 
If  the  model  of  a  real-world  system  accurately  reflects  the  behavior  of  the  system,  then  a 
simulation  can  be  used  to  study  the  system  without  changing  the  real-world  system.  This 
form  of  experimentation  can  increase  a  user’s  knowledge  of  the  system.  Simulation  can  also 
be  used  to  experiment  with  models  of  systems  that  do  not  yet  exist,  thereby  providing  an 
often  used  system  design  tool  for  complex  and  costly  systems  (2:4).  Pritsker  (19:6)  states 
that  simulations  of  real-world  systems  provide  the  experimenter  with  inferences  about 
systems 


‘. . .  without  building  them,  if  they  are  only  proposed  systems;  without  disturb¬ 
ing  them,  if  they  are  operating  systems  that  are  costly  or  unsafe  to  experiment 
with;  without  destroying  them,  if  the  object  of  an  experiment  is  to  determine 
their  limits  of  stress.’ 

For  the  Department  of  Defense,  military  battle  simulations  precisely  conform  to  all 
of  Pritsker 's  observations.  Wars  are  certainly  unsafe  to  experiment  with.  During  times 
of  peace  battle  simulations  provide  valuable  information  concerning  our  preparedness  for 
war.  Many  battle  simulations  are  complex  enough  that  sequential  uniprocessors  require 
several  hours  and  even  days  to  simulate  a  relatively  short  tactical  scenario  (10).  There  are 
still  several  questions  regarding  the  parallelization  of  complex  simulations.  These  questions 
include  the  following: 
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1.  How  should  the  problem  domain  be  partitioned? 

2.  Which  method  of  synchronization  should  be  used? 

3.  Can  load  balancing  be  achieved  across  N  processors? 

This  thesis  investigates  each  of  these  questions  in  the  context  of  the  classical  pool 
balls  simulation. 

1.2  Thesis  Siatemetit 

A  concurrently  executing  non-queueing  model  discrete  event  simulation  can  achieve 
neai  linear  speedup  implemented  with  a  conservative  synchronization  paradigm  incorpo¬ 
rating  spatial  partitioning  and  limited  data  replication. 

LS  Scope 

This  thesis  effort  investigates  the  parallelization  of  a  conservative  discrete  event  sim¬ 
ulation  incorporating  a  classical,  spatially  partitionable  model.  The  specific  model  chosen 
for  investigation's  the  well  defined  pool  balls  problem.  The  conclusions  are  applicable  to 
any  spatially  partitionable  model  such  as  battlefield  simulations.  The  distributed  software 
design  is  object  oriented  and  spatially  partitioned.  The  movement  of  each  pool  ball  is 
recorded  to  disk  for  analysis  and  graphics  display.  The  user  can  specify  the  number  of 
pool  balls  to  simulate,  the  number  of  logical  processes,  the  number  of  physical  processes, 
the  simulation  run  time  and  the  pool  table  dimensions.  The  user  may  also  specify  various 
options  such  as  writing  to  disk,  printing  to  screen,  collecting  statistics  and  checking  for 
errors.  Each  option  selected  affects  the  e.xecution  time  of  the  overall  simulation. 

Each  pool  ball  is  created  during  initialization.  The  parameters  of  position  specified 
as  A'  -  Y  coordinates  and  velocity  specified  as  A'  -  Y  vectors  are  randomly  generated 
using  a  machine  independent  pseudo  random  number  generator  developed  by  Law  and 
Kelton  (14). 

The  performance  of  this  pool  table  simulation  is  compared  to  that  of  Cal  Tech’s 
‘Colliding  Pucks’  e.xperiment  to  gain  insight  towards  the  desirability  of  the  conservative 
synchronization  paradigm  over  the  optimistic  synchronization  paradigm.  This  is  particu¬ 
larly  important  because  the  conservative  paradigm  has  been  shown  to  require  only  as  much 
memory  as  its  sequential  counterpart  while  the  optimistic  paradigm  requires  large  amounts 
of  memory  and  has  the  potential  to  exhaust  memory  before  .simulation  termination. 
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l.y,  Research  Objectives 

The  objectives  of  this  thesis  effort  are: 

1.  To  demonstrate  by  example  that  a  spatially  parti tionable  model  discrete  event  sim¬ 
ulation  can  be  parallelized  on  distributed  loosely  coupled  processors  using  a  conser¬ 
vative  synchronization  paradigm. 

2.  To  demonstrate  that  the  parallelization  of  a  spatially  partitionable  model  discrete 
event  simulation  can  achieve  reasonable  speedup. 

3.  To  demonstrate  that  the  conservative  synchronization  paradigm  is  comparable  to  an 
optimistic  synchronization  paradigm  when  applied  to  a  spatially  partitionable  model 
parallel  discrete  event  simulation. 

1.5  Assumptions 

Several  assumptions  were  made  in  the  analysis,  design  and  development  of  the  pool 
balls  simulation.  As  a  minimum,  the  following  equipment  and  hardware  specifications  weie 
assumed: 

1.  A  distributed  loosely  coupled  hypercube  having  eight  or  more  nodes. 

2.  Monotonicity  of  message  traffic  between  nodes  is  strictly  maintained.  This  require¬ 
ment  states  that  if  messages  occur  at  time  such  that 

0  <  ti  <  <2  <  Ai  then  the  target  nodes  will  receive  the  messages  in  the  same  order. 

3.  Dynamic  allocation  and  deallocation  of  memory  is  allowed. 

1.6  General  Approach 

The  general  approach  for  this  thesis  consisted  of  six  steps: 

1.  A  literature  search  was  conducted.  Familiarity  with  different  types  and  classes  of 
simulations  was  acquired  and  an  understanding  of  the  Chandy-Misra  paradigm  de¬ 
veloped. 

2.  The  requirements  analysis  was  performed  for  the  software  system.  The  assumptions 
generated  were  required  to  conform  to  those  developed  by  Cal  Tech  to  provide  a. 
means  for  performance  comparison. 
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3.  The  requirements  analysis  was  validated  and  the  pool  balls  simulation  was  designed. 
An  object  oriented  approach  was  used  to  enhance  software  documentation,  mainte¬ 
nance  and  changeability. 

4.  The  software  was  written  in  a  three-step  process.  First,  a  sequential  pool  balls 
simulation  was  designed  implementing  a  non-partitioned  pool  table.  Second,  the 
sequential  system  from  the  first  stage  was  modified  so  that  the  pool  table  could  be 
partitioned  into  vertical  ‘slices’.  Third,  the  partitioned  pool  table  resulting  from 
the  second  stage  was  parallelized  on  the  Intel  iPSC/2  hypercube.  All  three  software 
versions  were  coded  in  the  C  programming  language. 

5.  Various  test  simulations  were  developed  which  provided  the  speedup  estimates  and 
performance  comparisons  between  Cal  Tech's  experiments  and  AFIT's  experiments. 
Tests  were  executed  using  all  three  software  stages  to  demonslra.te  output  consistency. 


1.7  Summary 

A  spatially  partitionable  model  discrete  event  simulation  can  be  parallelized  onto  a 
distributed,  loosely  coupled  processor.  Near  linear  speedup  is  achievable  using  a  conser¬ 
vative  synchronization  paradigm.  These  assertions  are  demonstrated  by  implementing  the 
well  documented,  classical  pool  balls  simulation  which  was  conceived  and  developed  by 
scientists  at  Cal  Tech  and  later  benchmarked  at  the  Jet  Propulsion  Laboratory.  A  conser¬ 
vative  paradigm  results  in  comparable  performance  to  the  Time  Warp  optimistic  paradigm 
based  upon  the  reported  results  from  Cal  Tech  and  the  Jet  Propulsion  Lab. 
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II.  Issues  in  Distributed  Discrete  Event  Shmdation 


2.1  Introduclion 

This  chapter  surveys  current  literature  on  topics  related  to  this  thesis.  This  review 
is  limited  to  on-going  research  in  parallel  discrete  event  simulations  and  briefly  discusses 
various  classes  of  computer  architecture  used  in  distributed  processing. 

2.2  Motivation  for  Parallel  Compuling 

If  a  computer  performs  one  instruction  at  a  time  in  sequential  fashion  then  the 
only  possibility  for  increasing  execution  performance  is  to  increase  the  speed  at  which 
instructions  arc  performed.  Despite  the  fact  that  VLSI  technology  has  been  doubling  the 
performance  of  computing  hardware  every  couple  of  years,  it  is  doubtful  that  this  trend 
can  continue  beyond  the  21st  century  and  is  certain  not  to  continue  indefinitely  (9;23). 

An  alternative  approach  for  increasing  execution  performance  is  to  design  and  use 
computer  architectures  that  perform  multiple  instructions  simultaneously.  If  a  sequential 
processor  requires  Tf  time  to  complete  a  process,  then  a  parallel  processo,  having  M 
processors  requires  a  lower  bound  of  ^  time  to  complete  the  same  process,  provided  that 
each  of  the  M  processors  is  equal  in  power  to  the  sequential  processor  and  that  all  M 
processors  are  100  percent  utilized.  The  increase  in  run  time  performance  ,s  a  factor  less 
than  or  equal  to  M.  This  theoretical  upper  bound  on  parallel  computing  performance  is 
the  primary  motivation  towards  parallel  computing.  In  the  future,  thib  n.a,y  be  the  only 
means  available  to  decrease  execution  time.  Even  if  technology  can  continue  increasing  the 
speed  of  sequential  processors,  the  rate  of  performance  increase  is  significantly  less  than 
the  potential  of  establishing  an  M-fold  increase  by  using  massive  parallelism. 

2. 3  Flynn’s  Taxonomy 

Michael  .1.  Flynn  classified  all  digital  computers  into  four  categories  according  to 
the  types  of  instruction  and  data  streams  used.  An  instruction  stream  is  a  sequence,  of 
instructions  executed  by  agiven  computer.  A  datastream  is  a  sequence  of  datarepiesenting 
input,  output  or  temporary  results  used  to  calculate  the  output.  Flynn’s  four  categories 
are  (12:32): 

1.  Single  Instruction  stream,  Single  Data  stream  (SISD). 
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2.  Single  Instruction  stream.  Multiple  Data  stream  (SIMD). 

3.  Multiple  Instruction  stream,  Single  Data  Stream  (MISD). 

4.  Multiple  Instruction  stream,  Multiple  Data  stream  (MIMD). 

The  simplest  architecture  is  the  SISD  class.  These  computers  execute  one  instruction  at 
a  time  and  operate  on  one  piece  of  data  at  a  t'rne.  The  most  complex  architecture  is  the 
MIMD  class.  MIMD  computers  c.xecute  multiple  instructions  simultaneously  on  different 
data.  One  may  think  of  a  MIMD  computer  as  several  processors  tied  together,  each 
processor  of  which  is  a.  fully  functional  and  often  times  powerful  computer.  The  manner 
in  which  the  processors  of  a  MIMD  architecture  are  tied  together  distinguishes  loosely 
coupled  and  tightly  coupled  MIMD  computers. 

SJ,  Tujlitly  Coupled  VS  Loosely  Coupled  MIMD  Archil eclures 

The  individual  processors  of  a  parallel  proces.s!  r  architecture  must  cooperate  with 
one  another  in  order  to  solve  a  particular  application.  This  cooperation  often  entails  the 
sharing  of  data  structures  and  variables  which  reside  j  i  memory.  One  approach  to  parallel 
architecture  design  is  to  have  a  global  memory  whic.i  each  processor  may  access.  This 
shared  memory  design  i.’  referred  to  as  tightly  coupled.  This  design  has  the  advantage  of 
internodal  communications  at  memory  speeds.  The  disadvantages  include  bus  contention, 
cache  coherence  and  memory  access.  Memory  can  only  be  read  from  or  written  to  one 
address  at  a  time;  hence,  the  individual  processors  of  a  tightly  coupled  architecture  often 
‘fight’  over  access  rights  to  memory.  Current  bus  architecture  technology  limits  the  number 
of  processors  to  40  or  50.  Cache  coherence  is  a  problem  in  that  multiple  processors  may  alter 
variables  in  memory  even  though  some  or  al'  of  the  variables  reside  within  an  individual 
processor’s  cache.  This  poses  the  problem  of  having  nmili.)**  .unable  values  within  the 
local  caches  of  different  processors  (9:19). 

An  alternative  approach  to  parallel  architect!' i..  design  is  to  have  separate  local 
memories  owned  and  controlled  b.,  the  individual  processors.  Such  a.  design  is  referred 
to  as  loosely  coupled.  These  designs  prohibit  processors  from  accessing  memory  variables 
outside  of  io^al  memory.  These  variables  must  tlmn  be  communicated  via  message  passing 
which  ij  considerably  slower  than  the  memory  speeds  achieved  by  the  shared  memory 
concept  discussed  above,  but  cache  coherence  and  bus  contention  are  not  problems  and 
the  size  of  the  architecture  is  scalable  to  several  thousand  processors  depending  upon  the 
connectivity  between  the  processors  (9:20,21).  Research  continues  as  to  the  applicability 
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of  both  architecture  designs  towards  specific  problems.  This  thesis  effort  focuses  on  the 
application  of  the  Intel  iPSC72  hypercube  loosely  coupled  MIMD  architecture  towards  the 
classical  pool  balls  simulation. 

2.5  Jlypercube  MIMD  Architectures 

A  loosely  coupled  distributed  architectur- 1.  ust  send  and  receive  commjnicated  vari¬ 
ables  across  interconnecting  communications  n^  ..or.hs  which  are  significantly  slower  than 
CPU  cycles,  bus  cycles  or  even  memory  cycles,  ^le  .appr->a(.h  toward  keeping  communi- 
cati'^ns  time  to  a  minimum  is  to  have  each  prot  ss.tr  (often  times  referred  to  as  a  ‘node’) 
directly  connected  to  every  other  processor  so  that  the  communications  line  is  both  short 
and  direct.  This  type  of  fully  connected  ’  .fMD  architecture  is  known  as  a  crossbar.  Cross¬ 
bars  require  M-  links  between  M  nodes  which  is  expensive  both  in  terms  of  hardware 
cost  and  size.  Scalability  of  crossbars  is  severely  restiicted  since  the  communications  links 
grow  with  the  square  of  M.  Thus,  the  communications  time  comple.xity  is  0(1)  at  the  cost 
of  0{M'~)  links  (9:114-116).  A.<  architecture  which  uses  fewer  communications  links  has 
greater  scalability  but  greater  communications  time.  The  least  number  of  links  possible 
between  nodes  is  two  represented  by  linear  arrays  and  ring  networks.  These  interconnec¬ 
tion  networks  l.ave  communications  time  complexity  of  0{M)  at  the  cost  of  only  0(i) 
links  (9:114-116).  These  types  o,  networks  work  well  if  the  application  requires  nodes  to 
share  data  between  themselves  and  their  immediate  neighbor. 

The  hypercube  enjoys  the  greatest  popularity  amongst  loosely  coupled  MIMD  ar¬ 
chitectures  because  of  its  versatility,  scalability  and  communications  time.  A  hypercube 
has  M  =  2'"  processors  intercomiected  as  a  binary  cube.  Each  processor  is  a  fully  self 
contained  computer  with  its  own  clock,  CPU  and  local  memory.  Each  processor  also  has 
in  connections  with  other  processors  in  the  cube.  Hence,  the  worst  case  communications 
time  between  any  two  processors  is  O(logjW).  This  places  the  communications  time  be¬ 
tween  the  high  speed  of  the  fully  connected  MIMD  architecture  and  the  ring  architecture 
while  preserving  the  capability  of  scalability.  The  Intel  iPSC/2  hypercube  has  a  front  end 
processor  that  is  directly  connected  (>  each  node  of  the  cube  via  a  10  Mbps  ethernet  con¬ 
nection.  Each  node  employs  a  Direct  Connect  Module  (DCM)  which  frees  a.  node’s  CPU 
from  directing  message  traffic.  Each  of  the  nodes  is  made  up  of  a  standard  Intel  80386 
processor  rated  at  4  MIPS  (9:441-4.51). 
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2.6  Performance  Measures 


There  are  boveral  measures  of  peribrmauce  for  pa.allel  computing.  The  most  common 
measuiement  is  speedup  which  Hayes  describes  as  the  ratio  of  the  tota!  .>:ec,,tion  time  on 
a  sequei.tial  computer  to  the  corresponding  execution  time  on  a  parallel  computer  using 
M  processors  (^11).  Mathematically, 

^  (1) 


Since  Tp  >  the  speedup  S  <  M .  Stone  feels  that  the  deli'  ition  used  by  Hayes 
and  others  leads  to  ambiguity  becctot  the  definition  provides  for  infl<iied  values.  Instead, 
Stone  states  that  speedup  is  the  latio  of  the  best  possible  serial  algorithm  implementation 
lo  the  parallel  implementation  (2l;l‘ii).  A  speedup  which  measures  the  performance  of  the 
same  algorithn>  implemented  serially  and  in  parMle!  should,  according  to  Stone,  be  defined 
as  relative  speedup.  This  thesis  uses  Stone's  concept  of  relative  speedup  for  performance 
analysis. 

Another  useful  performance  measure  is  efficiency  which  Hayes  describes  as  speedup 
per  degree  of  parallelism  (11:583),  defined  mathematically  as: 


EiM)  = 


S{M) 

M 


(2) 


2.7  Taxonomy  of  Simulations  and  Simxdation  Models 

A  simulation  may  be  eithei  discrete  or  continuous.  A  discrete  system  allows  the  state 
varialicc  to  change  only  at  discrete  points  in  time  whereas  a  continuous  system  allows  the 
state  variables  lo  change  continously  over  time.  A  model  is  defined  as  a  repre.scntation  of 
a  system.  A  model  need  only  include  th-’  aspects  of  a  real  system  under  observation  whose 
beliavioral  characteristics  arc  intended  for  study.  The  model  is  hence  a  representation 
of  a  real  world  entity  but  it  is  also  a  simplification  of  that  entity.  Models  are  typically 
described  by  three  attributes:  static  or  dynamic,  deterministic  or  stochastic,  and  discre'e 
or  continous.  A  static  model  represents  an  entity  at  a  particular  point  in  time  (usually 
referred  to  as  a  Monte  Carlo  model)  whereas  a  dynamic  model  represents  an  entity  as  it 
changes  over  time.  The  pool  balls  simulation  uses  a  dynamic  model.  A  deterministic  model 
has  a  known  set  of  inputs  and  results  in  a  unique  set  of  outputs.  There  arc  no  random 
variables  in  a  deterministic  model.  .A  stochastic  model  is  probablistic  and  relics  upon  one 
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or  more  random  variables  as  inputs.  Due  to  the  randomness  of  the  inputs,  a  stochastic 
model  must  be  considered  only  as  an  estimator  of  an  entity's  behavior.  Statistical  estimates 
sought  as  outputs  of  a  stochastic  model  include  the  mean  time  between  failure,  the  mean 
service,  or  mean  wait  time.  The  pool  balls  simulation  uses  a  stochastic  model  as  the  pool 
balls  are  generated  with  random  positions  and  velocities.  A  simulation  model  may  be  either 
discrete  or  continuous.  A  discrete  simulati*.  n  model  represents  an  entity  that  changes  only 
at  discrete  points  in  time.  A  continuous  model  represents  an  entity  that  changes  constantly 
over  time.  Most  queueing  models  are  discrete.  The  pool  balls  model  is  continous.  It  is 
important  to  note  that  a  continous  model  in<ty  be  observed  only  a?,  discrete  points  in  time 
and  a  discrete  model  may  be  continously  observe  J  over  time.  The  pool  balls  simulation 
is  discrete  but  the  model 's  continous  as  each  pool  ball  continously  changes  with  time 
(2:3-12). 

k  simulation  may  be  time  driven  or  ./cnt  driven.  A  time  driven  simulation  updates 
a  dynamic  simulation  model  by  con.‘‘.ant  time  intervals.  With  regard  to  the  pool  balls 
simulation,  a  time  driven  implementation  would  move  each  pool  ball  by  a  predetermined 
vlclta  t.  A  time  driven  simulation  may  be  allowed  to  process  faster  by  increasing  the  delta 
t  value  thereby  requiring  fewer  updates  over  a  specified  tinH  interval;  however,  resolution 
of  the  simulation  output  decreases  as  the  delta  i  increases. 

.4n  event  driven  simulation  updates  objects  within  a  .^imulation  model  at  discrete 
points  in  lime  which  have  been  defined  as  ‘events  of  interest'.  If  the  events  can  be  properly- 
defined,  the  event  driven  simulation  promises  theoretical  improvement  over  its  time  driven 
counterpart.  This  potential  performance  gain  of  the  event  driven  approach  arises  from 
only  having  to  calculate  the  stale  information  for  the  exact  set  of  events  of  interest.  The 
lime  driven  approach  will  calculate  the  stale  information  for  not  only  the  set  of  events  of 
interest  but  al.so  for  all  of  the  incremental  slates  corresponding  to  the  delta,  times  (which 
.i.re  not  events  of  interest). 

3.S  Distributed  Discrete  Event  Simxdation  Paradigms 

Chandy  and  Misra  developed  a  conservative  synchronization  paradigm  in  1979  for 
the  sjcessful  implementation  of  a  distributed  discrete  event  simulation.  Jefferson  et  al 
presented  their  optimistic  synchronization  paradigm  in  1985.  To  date,  all  other  proposed 
paradigms  are  variations  and  extensions  to  the  two  original  paradigms.  The  problem  that 
both  paradigms  overcon. e  is  the  handling  of  out-of-sequeuce  messages.  Consider  the  pool 
balls  scenario  presented  in  Figure  1. 
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j  Figure  1.  A  Partitioned  Pool  Table  with  a  Future  Collision 

In  Figure  1,  suppose  that  pool  balls  1  and  2  will  collide  with  one  another  at  time 
t  =  ‘1.  Let  us  assume  that  for  the  parallel  implementation,  the  pool  table  is  sliced  into 
sectors  with  one  sector  allocated  to  each  of  three  nodes.  Each  node  knows  only  of  the 
e.\istence  of  its  pool  balls.  The  leftmost  sector  on  node  0  cannot  ‘.see’  ball  2  on  node  2  and 
vice  versa.  This  situation  causes  node  0  to  predict  that  I)all  1  will  strike  the  top  horizontal 
cushion  at  time  t  =  5  while  node  2  will  predict  that  ball  2  will  exit  the  sector  at  time 
t  =  1.  If  both  Jiode  0  and  node  2  e.xecute  their  events  simultaneously,  node  2  will  be 
correct  and  node  0  will  be  incorrect.  Eventually,  ball  2  will  migrate  to  node  0  but  the 
collision  between  the  two  pool  balls  will  no  longer  be  possible  because  ball  1  has  already 
been  simulated  past  the  collision  time  of  /  =  '1.0.  The  arrival  of  ball  2  at  node  0  at  time 
=  3.0  is  an  oxd- of -sequence  message  if  node  0  has  already  simulated  past  time  i  =  3.0. 

An  optimistic  strategy  assumes  that  out-of-sequence  messages  will  not  occur;  thus, 
every  node  in  the  distributed  system  processes  all  of  the  data  that  it  can.  As  each  event  is 
processed,  the  state  data  is  stored  in  memory.  If  an  out-of-sequence  message  does  occur  as 
it  would  in  the  scenario  of  Figure  1,  the  node  that  receives  the  message  reverses  (referred  to 
as  rollback)  the  simulation  back  to  the  last  event  executed  just  prior  to  the  out-of-sequence 
message.  The  simulation  is  then  recalculated  with  the  newly  arrived  message  data. 


A  conservative  strategy  prevents  out-of-sequence  messages  from  occuring  by  pre¬ 
venting  processors  from  executing  until  such  a  time  that  they  can  safely  guarantee  that  no 
out-of-sequence  messages  will  arrive.  Thus,  node  2  of  Figurel  would  have  been  allowed  to 
process  the  exit  event  of  ball  2,  but  node  0  would  have  been  required  to  wait  (i.e.  to  sit 
idle)  until  it  w'as  safe  to  process.  The  mechanism  used  by  Chandy  and  Misra  to  guaran¬ 
tee  system  correctness  is  the  Minimum  Safe  Time  (MST)  which  is  a  calculated  value  of 
simulation  time  which  guarantees  that  no  out-of-sequence  messages  will  occur  up  to  time 
i  <  MST. 

Both  paradigms  have  strengths  and  weaknesses.  The  optimistic  strategy  requires 
large  amounts  of  memory  for  rollback  and  can  exhaust  the  memory  during  the  simulation. 
Chandj  and  Misra  have  show’n  that  theii  conservative  paradigm  requires  onlj  a  bounded 
amount  of  memory  and  does  not  require  more  memory  than  a  sequential  simulation  (15). 
Lipton  and  Mizell  assert  that  Time  Warp  outperforms  Chandy-Misra  by  a  factor  of  p  in 
the  best  case  and  cannot  lag  arbitrarily  far  behind  Chandy-Misra  in  the  worst  case  (16). 
This  is  based  on  the  intuitive  premise  that  Time  Warp  can  ‘win  big’  if  it  correctly  guesses 
the  correct  choices  concerning  what  events  to  process  and  what  events  not  to  process. 
Furthermore,  even  if  a  processor  incorrectly  processes  an  event,  as  in  Figure  1,  it  is  the 
processor  which  has  processed  furthest  in  simulation  time  which  is  penalized;  therefore, 
the  simulation  is  no  slower  than  the  slowest  processor  plus  some  constant  overhead  factor 
to  enforce  the  roll  back.  Lin  and  Lazowskahave  taken  a  more  analytical  view  but  conclude 
basically  tlie  same  thing.  Their  conclusion  is  based  upon  models  of  the  Time  Warp  and 
Chandy  Misra  paradigms  which  employ  several  underlying  assumptions.  .Assumption  2.1 
in  Lin  and  Lazowska’s  paper  states  that  each  logical  process  is  assigned  to  a  dedicated 
processor.  This  assumption  reduced  the  potential  speedup  of  their  model  because  not  all 
conservative  models  require  a  one  to  one  mapping  between  logical  processes  and  processors. 
Let  the  number  oflogical  processes  be  A:  and  the  nHmi)er  of  processors  be  n.  The  probability 
of  an  idle  i)rocessor  using  Chandy-Misra  decreases  as  n  decreases  such  that  f;  >  7i.  (15). 
Therefore,  Lin  ?jk1  Lazowska’s  conclusions  mav  be  erroneous  for  conservative  stratecies 
that  can  assign  multiple  processes  to  processors.  Lin  and  Lazowska  referenced  this  fact  in 
their  concluding  remarks.  Assumption  2.2  in  Lin  and  Lazowska’s  paper  states  that  Time 
Warp  can  rollback  a  simulation  in  negligable  time.  This  underlying  assumption  is  perhaps 
required  to  reduce  the  variables  in  an  analytical  model,  but  the  assumption  is  not  realistic. 
Indeed,  Lin  and  Lazowska  state  in  their  concluding  remarks  that  the  overhead  of  the  Time 
Warp  operations  is  greater  than  that  of  the  Chandy-Misra  operation.  This  discrepency  has 
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beei>  taken  into  consideration  by  Lipton  and  Mizell  causing  tliein  to  conclude  that  Time 
Warp  is  always  witliin  a  constant  factor  of  optimal.  The  question  addressed  by  this  thesis 
is  whether  or  not  a  conservative  approach  can  also  be  made  within  a  factor  of  optimal  and 
whether  or  not  this  factor  can  be  higher  than  Time  Warp.  This  thesis  also  demonstrates 
that  the  pool  balls  simulation  can  be  effectively  partitioned  such  that  multiple  logical 
processes  can  be  assigned  to  each  processor. 

2.9  The  Theories  of  Chandy  and  Misra 

Each  process  in  a  physical  system  is  simulated  by  a  separate  logical  process.  Chandy 
and  Misra  use  the  term  LP  for  logical  processes  and  PP  for  physical  processes.  The  logic 
of  an  LP  depends  .solely  upon  the  PP  that  it  is  simulating.  .4n  LP,  has  a.  communic.ations 
link  to  LPj  if  and  only  if  PPj  has  a  communications  (dependency)  link  to  PPj.  .411 
messages  between  LP,  and  LPj  consist  of  a  tuple  (t.m)  such  that  I  represents  the  time  of 
the  message  and  m  represents  the  contents  of  the  message.  .4n  LP  can  only  process  up  to 
the  time  of  the  latest  tuple  which  was  received.  This  condition  is  sufilcient  to  guarantee 
that  no  out-of-sequence  messages  will  be  received  by  any  LP  and  simulation  correctness 
is  thus  guaranteed  (').  These  concepts  form  the  basis  for  Chandy  .and  Misra's  origimal 
publication  in  197S  subject  to  the  following  constraints; 

1.  A  process  may  decide  to  send  a  mess.age  at  any  arbitrary  time  /.  >  0  (CrddO). 

2.  For  all  message  tuples  of  a  simulation  time  period  [O^Z), 

0  </.,<■■■<  4  <  2  (G:d‘12). 

3.  ,4  message  is  sent  from  LP,  to  LPj  if  and  only  if  LP,  is  ready  to  send  the  message 
and  LPj  is  ready  to  receive  it  (6:'143). 

The  third  constraint  stated  above  allows  for  the  possibility  of  deadlock.  Chandy  and 
Misra  assert  that  all  distributed  discrete  event  .simul.ations  using  a  conservative  par<adigm 
are  subject  to  deadlocks  and  therefore  require  a  mechanism  to  .accommodate  it.  Chandy 
.and  Misra  provide  three  such  mechanisms.  The  first  two  arc  straight  forward  deadlock 
detection  .and  recovery  .and  de.adlock  avoidance.  The  third  mechanism  which  is  both  favored 
and  pioneered  by  Chandy  and  Misra  is  the  concept  of  NULL  messages.  Such  a  message 
consists  of  the  tuple  (t,NULL)  which  does  not  exist  in  the  physic.al  .system.  The  presence 
of  a  NULL  message  .allows  LP’s  to  continue  processing  up  to  the  time  of  the  NULL  mess.age 
when  they  would  otherwise  be  blocked.  The  following  queueing  network  taken  from  Chandy 
and  Misra’s  article  serves  to  demonstrate  this  process  (6:‘1<16). 
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1.  Source  outputs  (50.  m,)  to  LPl. 

2.  LPl  outputs  (oO-mi)  to  LP2. 

3.  LP2  outputs  (55,  mi)  to  LP3.  At  this  point,  LP3  is  still  waiting  to  receive  a  message 
from  LPl. 

‘1.  Source  outputs  (100,  m,)  to  LPl. 

5.  LPl  outputs  (100,  ma)  to  LP2. 

6.  LP2  xmil$  to  output  (105,  mj)  to  LP3  because  LP3  is  not  ready  to  receive  another 
message  from  LP2  until  it  first  receives  a.  me.ssage  from  LPl. 

7.  Source  outputs  (150. 703)  to  LPl. 

S.  LPl  xmits  to  output  (150. m3)  to  LP2  because  LP2  is  not  ready  to  receive  another 
message  from  LPl  until  (105. 7n2)  is  sent  to  LP3. 

■At  this  j)oint,  the  queueing  network  is  dcr/rf/oclcr/ because  LP3  is  e.xpecling  a  message  from 
LPl  which  it  never  received;  therefore,  LP3  cannot  accept  LP2's  .second  massage  which 
cannot  accept  LPl 's  third  message.  LPl  will  never  be  able  to  send  LP3  a  massage  because 
it  is  waiting  for  LP2  to  accept  LPlls  third  message.  Every  LP  is  thus  wailing  upon  every 
other  LP. 

Insertion  of  NULL  messages  in  Chandy  and  Misra's  c.\'ample  avoids  U;:j  possibility 
of  deadlock.  .At  the  time  of  the  arrival  of  the  fir.st  message  7n,  at  LPl,  LPl  determined 
the  message  should  be  addressed  to  LP2.  Even  though  this  message  was  not  the  type 
required  by  LP3,  LPl  can  still  .send  a  NULL  message  to  LP3  at  time  t  =  50.  This 


would  have  allowed  LP2  to  send  the  tuple  (105,  ?n2)  to  LP3  and  LP3  could  have  then 
received  the  tuple.  It  should  be  clear  that  the  NULL  messages  corresponding  to  the 
tuples  {50,  NULL),  {\00,NULL)  and  {150,  NULL)  are  sulTicient  to  avoid  the  deadlock 
situation.  The  drawback  to  the  NULL  message  approach  is  that  logical  processes  are 
required  to  process  more  messages  than  exist  in  the  physical  system.  Such  approaches  are 
ill  suited  towards  course  grain  machines  due  to  the  excessive  message  traffic  which  can 
result. 

Chandy  and  Misra  expanded  their  theory  and  presented  a  follow  up  paper  in  1981  (5). 
They  developed  a  new  constraint  such  that  within  a  physical  system,  the  ‘...behavior 
of  a  PP  at  time  t  cannot  be  influenced  by  messages  transmitted  to  it  after  t  (5:198)’. 
This  necessary  condition  is  called  the  veulizability  condition.  This  leads  indirectly  to  the 
assertion  that  if  LPi  sends  LFj  a  message  (U-,  nik),  it  implies  that  all  messages  from 
PPi  to  PPj  have  been  simulated  up  to  time  4.  (5:199). 

2.10  Event  Modeling 

Schruben  defines  a  system  as  a  set  of  entities.  Entities  may  fall  into  one  of  two 
general  categories  referred  to  as  resident  and  transient.  A  resident  entity  is  considered  to 
have  the  property  of  permanent  existence.  For  example,  a  simulation  of  a  factory  might 
model  the  machines  in  the  factory  as  resident  entities  since  the  machines  are  always  there. 
A  transient  entity  is  not  permanent.  Thus  the  factory  simulation  might  instead  model  the 
behavior  of  the  parts  as  they  pass  from  machine  to  machine  (20:101-102).  With  respect  to 
the  pool  balls  simulation,  this  corresponds  to  modeling  the  pool  table  sectors  as  resident 
entities  or  modeling  the  behavior  of  the  pool  balls  as  transient  entities.  Schruben  maintains 
that  both  viewpoints  are  equally  valid  and  both  viewpoints  should  be  considered  during 
the  simulation  design  phase. 

2.11  Spatial  Partitioning 

Wieland  and  Hawley  researched  the  application  of  sectoring  a  battlefield  for  the 
STB89  tactical  battle  simulation  (22).  Each  object  in  the  simulation  has  a  ‘perception 
radius’  which  defines  the  range  that  an  object  can  detect  another  object.  As  an  object 
approaches  a  partition  border,  the  perception  radius  eventually  becomes  tangent  to  the 
border.  If  the  object  continues  to  move  toward  the  border,  the  perception  radius  will 
protrude  into  the  adjacent  sector.  This  condition  requires  that  the  object  have  knowledge  of 
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all  objects  residing  within  the  ao  ;ent  sector.  Wieland  identified  two  techniques  available 
to  accommodate  this  condition.  The  first  is  to  have  the  sector  owning  the  moving  object 
receive  a  copy  of  all  of  the  objects  residing  within  the  adjacent  sector.  This  technique  is 
presumed  to  yield  poor  results  because  the  search  space  of  the  original  sector  will  approach 
the  search  space  of  a  non-partitioned  battlefield  thereby  negating  any  potential  gains.  The 
second  approach  that  Wieland  identified  is  to  provide  the  second  sector  with  a  copy  of  the 
object  which  is  moving  towards  it.  Hence,  only  one  object  is  passed  and  the  search  space 
of  both  sectors  has  at  most  an  0(1)  increase.  Wieland’s  strategy  identifies  three  critical 
events  as  shown  in  Figure  3  (22:3). 


Figure  3.  Wieland’s  Grid  to  Grid  Proximity  Detection 

Wieland  states  that  the  first  event  occurs  whenever  a  part  of  an  object’s  perception 
radius  is  tangent  to  the  sector  boundary.  At  this  time,  an  ‘Add.linit’  message  is  sent 
to  the  adjacent  sector;  however,  the  adjacent  sector  does  not  ‘control’  the  object  added. 
Wieland  refers  to  this  additional  object  message  as  ‘data  replication’  since  the  object  exists 
on  two  processes  (sectors).  The  second  event  occurs  when  the  object’s  center  (i.e.  that 
which  defines  the  object’s  location)  crosses  the  sector  border.  At  this  time,  a.  change  of 
ownership  message  is  sent  from  the  original  sector  to  the  gaining  sector.  The  third  event 
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occurs  when  the  object’s  perception  radius  is  again  tangent  to  the  sector  border.  At  this 
time,  a  ‘Delete.Unit’  message  is  sent  from  the  gaining  sector  to  the  losing  sector. 

Wieland  added  a  comment  in  his  analysis  stating  that  the  second  event  which  he 
identified  in  his  data  partitioning  and  replication  strategy  could  be  eliminated.  There  were 
no  comments  regarding  any  experimental  studies  concerning  this  latter  assertion. 
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in.  Reqnirements  Analysis 


3. 1  Inlroduclion 

The  Air  Force  Institute  of  Technology  is  interested  in  developing  techniques  for  the 
successful  design  and  implementation  of  parallel  discrete  event  simulations  beneficial  to 
the  Department  of  Defense.  Two  specific  applications  include  hardware  design  simula¬ 
tions  (using  VIIDL,  for  example)  and  battlefield  simulations.  The  Institute  is  currently 
emphasizing  distributed  simulations  incorporating  the  conservative  synchronization  strat¬ 
egy  rather  than  the  optimistic  strategy.  Unfortunately,  there  has  been  little  reported  in 
the  literature  on  either  design  or  implementation  for  non-queueing  theoretic  models  for 
discrete  event  simulations  which  use  the  conservative  paradigm.  Several  questions  need 
to  be  addressed  before  attacking  the  parallelization  of  large  software  systems  such  as  the 
battlefield  simulations  used  by  the  DoD.  These  questions  include: 

1.  How  can  (or  should)  the  problem  domain  be  partitioned? 

2.  Can  a  distributed  simulation  achieve  comparable  (or  superior)  performance  using  a 
conservative  strategy  rather  than  an  optimistic  strategy? 

3.  Can  speedup  be  achieved  to  a  large  enough  degree  to  make  the  parallelizing  of  existing 
DoD  simulations  worthwhile? 

4.  Are  successful  distributed  simulations  incorporating  a  conservative  strategy  scalable? 

Insights  into  the  above  questions  may  be  found  through  the  design  and  implemen¬ 
tation  of  a  small  scale  simulation.  The  classical  pool  balls  simulation  is  ideal  subject 
matter  because  it  has  many  of  the  same  processes  that  a  battlefield  simulation  has.  These 
processes  include  the  handling  of  moving  objects  through  space  (albeit  two  dimensional), 
search  algorithms  for  object  event  identification,  geographic  domain  structure  and  appli¬ 
cation  of  spatial  partitioning  with  limited  data,  replication.  Furthermore,  the  pool  balls 
problem  domain  is  well  understood  and  documented  and  test  results  are  available  for  the 
comparison  between  an  optimistic  strategy  and  a  conservative  one.  This  will  also  be  the 
first  research  effort  at  the  Institute  for  the  design  and  implementation  of  a  non-queueing 
problem  incorporating  an  event  list  with  the  conservative  paradigm;  therefore,  this  research 
will  provide  valuable  experience  to  the  Institute  for  follow  on  work. 
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3.2  Requirements 

The  following  requirements  were  established  for  this  project. 

1.  Comply  with  Stated  Requirements  of  Cal  Tech’s  Pool  Ball  Experiment.  It 
was  desirable  to  maintain  consistency  with  Cal  Tech’s  simulation  for  all  matters  of 
relevance  so  that  a  comparison  could  be  made  between  the  two  paradigms  used  (3). 
Some  of  the  stated  requirements  of  the  Cal  Tech  experiment  were  not  considered 
relevant  and  were  thus  ignored.  Two  specific  instances  include  Cal  Tech’s  requirement 
that  the  pool  balls  have  variable  radius  and  mass.  The  following  requirements  were 
extracted  from  Cal  Tech’s  requirements. 

(a)  Each  pool  ball  has  measurable  size  and  measurable  mass  (i.e.  the  pool  balls  are 
not  point  particles)  . 

(b)  Collisions  between  pool  balls  are  perfectly  elastic  thereby  conserving  energy  and 
momentum. 

(c)  The  .pool  balls  move  without  friction. 

(d)  Rotational  energy  of  pool  balls  is  ignored. 

(e)  The  enforcement  of  collisions  follows  the  physical  properties  of  elastic  collisions 
(i.e.  the  collisions  are  realistic). 

(f)  The  pool  table  has  no  ‘pockets’;  therefore,  the  number  of  pool  balls  for  any 
given  simulation  does  not  change  for  the  duration  of  the  simulation. 

(g)  Every  pool  ball  occupies  a  unique  space  on  the  table  and  no  two  balls  can  occupy 
a  portion  of  the  same  space  (i.e.  overlap  is  not  allowed). 

2.  The  Simulation  Will  Support  Variable  Quantities  of  Pool  Balls.  The  upper 
bound  on  the  number  of  pool  balls  is  specified  in  terms  of  the  memory  available  for 
dynamic  allocation  of  pool  ball  instantiations  and  what  will  physically  fit  on  the  pool 
table. 

3.  The  Dimensions  of  the  Pool  Table  Will  Be  Modifiable.  Although  the  length 
and  width  of  the  pool  table  is  not  deemed  a  highly  dynamic  variable,  the  capability  for 
changing  the  length  and  width  is  required.  This  factor  allows  for  changing  densities 
of  pool  balls  on  the  table  as  well  as  the  ability  to  expand  the  pool  table  so  that  more 
pool  balls  can  physically  be  located  on  it. 
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4.  The  Radius  and  Mass  of  Each  Pool  Ball  Will  Be  Equal. 

5.  This  Simulation  Will  Incorporate  a  Conservative  Synchronization  Strat¬ 
egy. 

6.  The  AFIT  Generic  Simulation  Shell  Will  be  Used  Both  in  the  Simulation 
Design  and  Implementation.  AFIT  has  developed  a  generic  simulation  driver  for 
discrete  event  simulations  for  standardization  purposes.  This  driver  is  object  oriented 
and  consists  of  a  next  event  queue,  a  clock,  an  event  manager,  and  a  simulation 
controller  that  sequences  through  the  simulation.  The  application  specific  software 
interfaces  to  AFIT’s  simulation  driver. 

8.8  Developing  the  Equations  of  Motion 

The  requirements  for  this  thesis  demand  that  the  collisions  between  pool  balls  con¬ 
form  to  the  principles  of  elastic  collision  and  frictionless  motion.  Since  the  development 
of  the  equations  of  motion  was  not  specifically  stated  in  previous  literature,  it  is  shown 
here  to  support  this  research.  The  initial  equations  used- can  be  found  in  most  elementary 
physics  books. 

The  equations  for  conservation  of  energy  and  momentum  for  two  pool  balls  B  and  b 
having  initial  velocity  vectors  Vq  and  Vq  and  final  velocity  vectors  Vj  and  vi  are  respectively 
as  follows: 


(3) 

mvo  -i-  niKo 

=  mvi  -b  mVi 

('0 

Equations  (-3)  and  (4)  form  the  basis  to  develop  algorithms  for  solving  the  events  of  pool 
balls  colliding  with  cushions  and  pool  balls  colliding  with  one  another. 

Another  useful  equation  is  that  of  frictionless  motion  on  a  two  dimensional  plane. 


A^o  -j- 14  *  AT 

Yo  +  14  AT 

The  set  of  events  S  contains  five  event  types  which  have  been  defined  as  events  of  in¬ 
terest.  These  events  arc  S  =  {VERT,  IlOR,  COLL,  PART,  EXIT).  These  event  types 
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correspond  to  vertical  cushion  collisions,  horizontal  cushion  collisions,  collisions  between 
pool  balls,  reaching  a  sector  boundary  and  crossing  a  sector  boundary  respectively.  The 
partition  and  exit  events  are  only  used  for  a  partitioned  pool  table  in  accordance  with 
Wieland’s  tw'o  step  sectoring  strategy  (22).  The  following  sections  derive  the  equations 
used  to  calculate  the  time  at  which  each  of  the  five  event  types  will  occur  and  the  equa¬ 
tions  used  to  calcute  the  ball  state  information  for  a  pool  ball  that  executed  an  event  type. 
It  is  assumed  that  the  pool  table  is  a  rectangle  withdts  top  and  bottom  cushions  parallel 
to  the  X-axis  and  its  left  and  right  cushions  parallel  to  the  Y-axis.  Throughout  this  thesis, 
the  top  and  bottom  cushions  are  referred  to  as  ‘horizontal  cushions’  while  the  left  and 
right  cushions  are  referred  to  as  ‘vertical  cushions'.  Figure  4  shows  the  layout  of  the  pool 
table  on  an  X-Y  coordinate  .system.  Each  ball  has  its  position  defined  by  the  X  and  Y 
coordinates  pf  the  ball's  center.  Each  pool  ball  has  the  following  state  information: 


•  ball  time  tag 

•  .Y 

•  Y 

I 

• 

The  pool  balls  algorithm  calculates  the  time  of  the  next  event  for  each  pool  ball.  Each 
pool  ball  will  have  an  event  corresponding  to  one  of  the  event  types  in  5.  To  determine 
which  of  the  event  types  will  occur  for  any  given  pool  ball,  an  event  time  is  calculated  for 
each  of  the  five  event  types.  By  definition  of  monotonicity,  the  earliest  calculated  event 
time  for  the  set  5  defines  the  next  possible  event  for  a  pool  ball.  The  time  in  of  each  event 
type  in  S  is  determined  by  in  =  /-i  -f  A7’  where  AT  is  calculated  using  equation  5  and  l-i 
is  the  current  ball  time  tag. 

3.3.1  Event  Calculalious  for  Collision.^  wilh  Cvihions  This  section  develojrs  the 
equations  used  to  calculate  ihe  time  at  which  a  pool  ball  will  strike  any  of  the  four  table 
cushions  and  the  ball  state  information  after  striking  any  of  the  four  table  cushions. 

3.3.J.1  CalcvUUing  AT  to  Strike  a  Cmliion  The  X-axis  coordinate  is  known 
for  both  the  left  and  right  vertical  cushions.  These  values  are  0.00  and  Xtabujtfjtb  respec¬ 
tively  where  Xiahujejisii,  is  the  user  defined  length  of  the  pool  table.  The  velocity  14  of 
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Figure  4.  Layout  of  a.  Pool  Table  on  an  X/Y  Axis  System 


a  pool  ball  determines  whether  the  ball  will  strike  the  left  or  right  vertical  cushion.  The 
known  X-axis  coordinate  for  the  approj)riate  vertical  cushion  is  substituted  into  equation  5 
as  Xi-  Using  equation  -5,  the  delta  time  of  the  collision  is  defined  by 


AT 


(Xi-Xo) 

14 


The  Y-axis  coordinate  is  known  for  both  top  and  bottom  horizontal  cushions.  These 
values  are  0.00  and  Ytaiu-xuidth  respectively  where  Yiabit.wtm  is  the  user  defined  width  of 
the  pool  table.  The  velocity  Vy  of  a  pool  ball  determines  whether  the  ball  will  strike  the  top 
or  bottom  horizontal  cushion.  The  known  Y-axis  coordinate  for  the  appropriate  horizontal 
cushion  is  substitued  into  equation  5  as  i'j.  Using  equation  -5,  the  delta  time  of  the  collision 
is  defined  by 


AT 


jYi  -  Yo) 
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S.3.1.2  Calculating  the  State  Information  for  Cushion  Collisions  The  ball 
time  tag  is  simply  replaced  with  the  time  of  the  event  to  be  executed.  This  time  is  stored 
as  a  parameter  with  the  event  message.  The  parameters  X  and  Y  can  be  calculated  directly 
from  equation  5.  The  velocities  Vj.  and  Vy  are  calculated  indirectly  from  equations  3  and 
4.  Let  V  be  the  velocity  of  the  cushion  and  u  be  the  velocity  of  the  ball  object.  The  mass 
of  the  cushion  is  much  greater  than  the  mass  of  the  ball;  therefore,  the  following  equations 
can  be  used  for  a  vertical  cushion  event. 


«rl 
■Vyl 

If  the  evCiil  type  is  a  horizontal  cushion  event,  then  equ.Uion  7  defines  the  new  values  of 
Vx  and  ^J,. 


* 
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1  ■ 

Equation  6  requires  the  X-axis  velocity  to  change  direction  while  the  Y-axis  velocity  re¬ 
mains  unchanged.  Equation  7  is  just  the  opposite. 

$.3.2  Calculations  for  Partition  and  Exit  Events  This  section  develops  the  equa¬ 
tions  used  to  calculate  the  time  at  which  a  pool  ball  will  reach  a  sector  border  (partition 
event),  or  depart  a  sector  to  an  adjacent  one  (exit  event).  This  section  also  develops  the 
equations  used  to  update  the  ball  state  information  after  a  partition  or  exit  event. 

3.3.2. 1  Calculating  AT  for  Partition  and  Exit  Events  The  X-axis  coordinate 
is  known  for  both  the  left  and  right  borders  of  any  interior  sector.  These  values  are  de¬ 
termined  dynamically  during  initialization  based  upon  the  user  specified  table  dimensions 
and  the  number  of  sectors  desired.  For  partition  events,  a  pool  ball  is  moved  to  a  loca¬ 
tion  corresponding  to  (Xie/tj,<,rder  +  H)  or  (Xrtghcjbordtr  -  R)  where  R  is  the  specified  pooi 
ball  radius.  For  exit  events,  a  pool  ball  cro.sses  a  partition  into  the  neighboring  sector 
and  moves  a  distance  of  2R.  The  new  X-axis  coordinates  will  be  iXujtj,order  -  R)  or 
iX„ghtJ,o>dtr  Y  R)-  The  velocity  I4  of  a  pool  ball  determines  whether  the  ball  will  reach 
the  left  or  right  border,  or  exit  the  sector  to  the  left  or  to  the  right.  Using  equation  5,  the 
delta  time  of  the  partition  or  exit  event  is  defined  by 

^  (A-.  -  Xo) 

^  V. 


-VxO 

'OyO 


(6) 
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3. 3. 2.2  Calculating  the  State  Information  The  ball  time  tag  is  simply  re¬ 
placed  with  the  time  of  the  event  to  be  executed.  The  parameters  X  and  Y  can  be 
calculated  directly  from  equation  5.  Since  partition  and  exit  events  are  not  associated 
with  collisions,  the  pool  ball  velocities  and  Vj,  do  not  change. 

3.3.3  Event  Calculations  for  Collisions  Between  Pool  Balls  To  determine  if  a  pool 
ball  will  strike  another  pool  ball,  the  two  point  formula  for  the  distance  between  two  lines 
can  be  used.  This  formula  is  stated  as  equation  (8). 

/-  =  (8) 

where  /  is  the  straight  line  distance  between  the  centers  of  the  two  colliding  pool  balls  at 
the  point  of  impact.  This  is  shown  in  Figure  5. 


Figure  -5.  Two  Balls  Colliding  at  the  Point  of  Impact 


3.3.S.J  Calculating  I  he  time  of  Impact  for  a  Ball  Collision  The  location  of  a 
pool  ball  is  defined  by  the  coordinates  of  its  center  of  mass;  therefore,  the  linear  distance 
separating  two  balls  at  the  moment  of  impact  is  the  sum  of  the  radii.  Since  every  pool 
l)all  for  this  simulation  has  equal  radius,  rhe  value  of  /-  in  equation  (8)  is  AR-.  Equation 


(8)  is  most  easily  implemented  if  the  two  colliding  pool  balls  have  the  same  initial  logical 
times;  otherwise,  the  difference  in  logical  times  must  be  accounted  for  as  variables.  The 
possibility  exists  that  two  colliding  pool  balls  could  have  different  initial  logical  times.  To 
accommodate  this  possibility,  the  pool  ball  having  the  lesser  logical  time  is  moved  by  a 
delta  t  such  that  its  logical  time  is  equal  to  that  of  the  other  pool  ball.  The  values  of 
.X"!,®!,}'!  and  1/1  (after  both  pool  balls  have  equal  logical  times)  must  be  substituted 
with  those  of  equation  (5).  Solving  for  AT  results  in  equation  (9). 

0  =  (lAT-  +  bAT  +  c  (9) 

where 

a  =  V;  +  V,j  -  -  2v,jV,j  +  n "  +  v; 

6  =  2A'oK,r  +  -  2AVWx  -  - 

2.To1(c  —  21/01"^  +  +  2i/oiij, 

c  =  .Yo  +  Yq  -  2xoX(i  -  2ijoYo  +  .Tq  +  vl  " 

The  values  of  a,  b  ami  c  are  simply  the  coefficients  to  the  quadratic  formula  from  which  AT 
may  easily  be  solved.  From  an  algorithmic  point  of  view,  only  real  roots  to  the  quadratic 
solution  represent  viable  collision  times.  As  such,  the  determinant  must  be  inspected  for 
non-negative  values.  If  the  quadratic  solution  consists  of  two  real  roots,  the  lesser  of  the 
two  represents  the  delta  time  at  which  the  two  pool  balls  in  question  will  collide. 

3. 3. 3. 2  Calctilaliud  the  Stale  Infotmalion  after  a  Ball  Collision  Once  two 
pool  balls  collide  with  one  another,  both  velocity  vectors  will  change.  .Solving  for  the  new 
velocity  vectors  in  the  X/Y  coordinate  system  is  most  easily  solved  if  the  coordinate  system 
is  rotated  to  form  a  new  R./P  orthogonal  .system  where  R  is  the  axis  formed  by  the  tangent 
to  the  two  pool  balls  at  the  point  of  collision  and  P  is  the  a.xis  which  is  perpcndiciilar  to 
R.  It  will  be  shown  that  the  vector  components  and  vjt  are  simply  interchanged  as  a. 
result  of  a  pool  ball  collision. 

To  solve  for  the  new  velocity  vectors,  equations  (3)  and  (4)  arc  used  where  lifol^  = 
+  Vpo  3-nd  \Vo\~  =  F/jo  +  Vf>Q  in  equation  (3).  After  substituting  the  values  of  luop  and 
|Vo|‘  into  equation  (3),  the  equations  for  conservation  of  energy  and  momentum  become 
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that  of  equation  (10)  and  (11)  in  terms  of  the  new  R/P  coordinate  system. 


vjio  +  VL 

_  vpo  +  I'n-o  _ 

4i  + 

+  I'no 

vm  +  Vm 

Vpo  +  Ypo 

vpi  -f-  Vpi 

(10) 

(11) 


Equations  11  and  10  can  he  easily  manipulated  to  produce  equation  12. 


O 

1 _ 

1 - 

o 

>— • 

1 

o 

1 

1 _ 

1 
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_ } 

- j 

O 

1 

o 

1 

(12) 


Solving  for  Vm  and  Vpi  yields  Vpx  =  vhq  and  Vpi  =  VpQ.  Substituting  these  values  into 
equation  (11)  results  in  equation  (14)  which  represents  the  resultant  velocity  vectors  of  the 
two  colliding  pool  balls  in  terms  of  the  R/P  coordinate  system. 


Vm 

Vno 

Vpi 

Vpo 

I'ni 

Vito 

Vpi 

I'/’O 

(13) 

(14) 


3.3. 3. 3  Rotating  the  X/Y  Coordinate  System  Figure  6  illustrates  the  rotation 
of  the  X/Y  coordinate  system  to  form  the  new  R/P  coordinate  system.  The  line  connecting 
the  two  pool  ball's  centers  is  one  of  the  desired  orthogonal  a,\es.  Let  this  line  be  L.  Let  0 
be  the  angle  made  with  tliis  line  and  the  X-axis.  Then 


0  =  arctan  ( 


(15) 


Let  <f)  be  the  angle  made  with  the  velocity  vector  V  and  the  X-axis.  Then 


4>  =  arctan 


(16) 


Let  fp  be  the  angle  made  with  the  velocity  vector  V  and  the  line  L.  Then 
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Figure  6.  Translating  an  X/Y  Axis  System 


(p  =  0-0 


(17) 


Tile  new  ortliogonal  reference  system  has  axes  R  and  P  wliere  R  is  the  axis  along  the 
line  L  connecting  the  two  balls  and  P  is  perpendicular  to  R.  The  new  velocity  vector  of 
a  given  ball  is  then  defined  by  F/}  and  Vp  which  are  clearly  the  following. 


V'co.s(yj) 

t;sin(^) 

(JS) 


The  velocity  components  Vp  and  vp  may  now  be  interchanged  in  accordance  with 
equation  however,  it  is  more  convenient  to  first  decompose  Vp  and  Vp  into  their 


respective  X  and  Y  components  yielding  equation  (19). 


'  ■ 

Vji  cos(0) 

VyR 

Vfisin(0) 

V.p 

-^^fisin(<?) 

Vfi  cos{0) 

(19) 


Once  14r  and  l^r  are  interchanged  with  v^r  and  v,jr,  tlie  resultant  velocities  must 
be  translated  back  into  the  X/Y  orthogonal  axis  system  resulting  in  equation  (20). 


'  Vr  ' 

'^xR  + 

+  FyP 

V. 

YxR  +  V,p 

YyR  +  V,jP 

(20) 


3.4  Simiilalion  Environment 

A  simulation  environment  e.xisted  for  all  distributed  discrete  event  applications.  This 
environment  was  required  to  be  used  in  an  effort  to  standardize  software  from  various 
research  efforts.  The  environment  is  object  oriented  in  the  C  programming  language.  The 
objects  defined  in  the  environment  include  a  simulation  driver,  a  clock,  a-  next  event  queue 
and  a  generic  event  which  must  be  tailored  to  a  specific  application. 


3.4-J  Simulation  Driver 


Functional  Description  The  driver  forms  a  basic  conditional  loop  construct.  .*\t  each 
iteration,  an  e\ent  is  executed,  a  new  event  determined,  and  another  event  is  executed 
until  a  DONK  event  is  reached.  A  DONE  event  .signifies  that  the  execution  of  another 
event  would  set  the  simulation  clock  beyond  the  user  specified  simulation  run  time. 
The  loop  exits,  and  control  of  the  program  is  returned  to  the  iP.SC/2  host  proces.sor. 


Functional  Description:  The  clock  object  manages  all  aspects  of  the  simulation  clock. 
The  simulation  time  is  updated  each  time  the  clock  object  is  called.  The  simulation 
time  is  a  double  precision  floating  point  variable. 

Attributes:  Time. 

Operations: 


•  init.time 

•  set  .time 

•  adv.time 

•  get.time 

3.4.2  The  Next  Event  Qveve  Object 


Functional  Description:  The  next  event  queue  (NEQ)  object  stores  all  scheduled  events. 
The  events  are  insertion  sorted  by  simulation  time  (next  event  time).  The  NEQ  is 
implemented  as  a  singularly  linked  list. 

Attributes:  None. 

Operations: 


•  initjieq 

•  showjicq 

•  add  .event 

•  get.event 

•  count.event 

•  neq.time 

•  simultaneous 

•  count 

3.4.3  The  Event  Object 

Functional  Description:  The  Event  object  as  implemented  is  actually  a  class  of  objects 
such  that  each  desired  event  must  be  instantiated  from  the  ‘event’  operations.  Each 
instantiation  becomes  an  object. 
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Attributes: 


•  Time 

•  ID 

•  Type  of  event 

.  Pool  Ball  ID(s) 

Operations: 

•  iiew.evenl  (allocates  memory  for  an  event) 

•  show.event 

•  zap.event  (deletes  memory  allocation  for  an  event) 
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IV.  Software  Design 


4.1  Introduction 

This  chapter  outlines  the  design  steps  used  to  develop  the  parallel  pool  balls  simula¬ 
tion.  The  pool  balls  simulation  was  designed  in  three  incremental  steps. 

1.  Development  of  a  sequential  simulation  without  partitioning  or  data  replication. 

2.  Development  of  a  sequential  simulation  with  partitioning  and  data  replication. 

3.  Development  of  a  parallel  simulation  with  partitioning,  data  replication  and  conser¬ 
vative  synchronization. 

Each  of  the  above  designs  were  object  oriented.  An  object  oriented  design  (OOD)  approach 
was  selected  to  enhance  software  maintenance  capabilities  for  follow-on  research  of  the 
pool  balls  concept.  This  chapter  describes  the  evolution  of  the  sequential,  non-partitioncd 
simulation  into  the  parallel  implementation  by  first  analyzing  the  sequential  version  and 
then  by  highlighting  the  design  changes  required  to  implement  the  follow  on  versions. 

4.2  Design  of  a  Sequential  Simxdation  without  Spatial  Partitioning 

The  simulation  environment  was  object  oriented.  An  object  oriented  application 
was  therefore  easier  to  interface  than  alternative  designs  such  as  functional,  top  down,  or 
Jackson  (.ISD).  The  OOD  pool  balls  application  defined  objects  representing  a  ball  object, 
ball  object  manager,  table  sector,  table  sector  manager,  random  number  generator  and  an 
event  handler  object.  The  table  object  creates  the  pool  table  for  the  appropriate  or  specified 
table  dimensions  and  stores  all  of  the  boundary  information  related  to  the  table.  The  bull 
object  creates  all  of  the  pool  balls  for  the  simulation  and  stores  them  in  a  data  structure. 
All  operations  related  to  the  management  of  pool  ball  objects  take  place  here.  The  random 
number  gcneratoi  !.■>  a  machine  independent  pseudo-iandom  numbei  generator  developed 
by  Law  and  Kelton  (Id).  This  random  number  generator  is  capable  of  generating  uniform, 
logarithmic,  exponential  and  normal  distributions  having  a  lower  and  upper  bound.  The 
eventJiandler  object  determines  the  next  event  of  interest  and  enforces  an  event  passed  to 
it. 
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4.2.1  Design  of  the  Ball  Object 


Functional  Description:  The  6«// object  is  an  instantiation  of  a  class  of  pool  balls.  Each 
pool  ball  object  is  dynamically  allocated  and  deallocated  to  and  from  memory. 

Attributes: 

•  Radius 

•  BallJD 

•  BalLTime 

.  X 
.  Y 

•  Vx 

•  Vy 

•  DoJ_SeeJt 

•  DoJ.OwnJt 

•  Who.OwnsJt 

Operations:  None. 

4.2.2  Design  of  the  Ball  Object  Manager 

Functional  Description:  The  ball  manager  is  an  abstract  data  type  whose  sole  purpo.se 
is  to  store  the  pool  balls  which  are  assigned  to  it. 

Attributes:  None. 

Operations: 


•  Initialize 

•  Add 

•  Remove 

•  Reset 

•  Increment 

•  Set 

•  More 
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•  Get-and_Delete 

•  Get_Next_Ball 

•  Get_This_Ball 

•  Print_BallsJn_This_Sector 

•  Head 

•  Tail 

•  IsJEmpty 

•  IsJFound 

•  Length.Of 

•  Check.This.Ball 

Detailed  Design:  It  was  desirable  to  design  data  structures  for  the  sequential,  non- 
partitioned  application  which  required  as  few  changes  as  possible  to  accommodate  the 
more  complex  sequential,  partitioned  version  and  the  parallel,  partitioned  version  of  the 
software.  It  was  also  desired  to  minimize  the  search,  add  and  delete  functions  for  the  ball 
object  manager.  These  functions  require  searching  and  traversing  the  data  structure  which 
stores  the  ball  objects.  For  the  relatively  .simple  case  of  the  sequential,  non-partitioned 
application,  an  array  structure  minimizes  the  search  and  traversal  time.  This  is  not  the 
case  for  a  partitioned  application  because  each  .sector  has  a  high  probability  of  containing 
only  a  subset  of  pool  balls.  Let  N  be  the  total  number  of  pool  balls  created  and  let  M  be 
the  number  of  pool  balls  in  any  given  sector  at  some  instant  in  time.  Then  M  <  N .  Three 
data  structures  were  considered  during  the  design  phase:  an  array,  a  linked  list  and  an 
indexed  linked  list.  The  time  complexities  for  each  of  the  three  data  structures  considered 
are  shown  in  Table  1  for  the  search  and  traversal  operations. 

Table  1.  Order  of  Analysis  for  the  Ball  Manager  Data  Structure 


Data 

Time  to 

Time  to 

Structures 

Search 

Traveise 

Array 

O(A') 

Linked  List 

massa 

rnmm 

Indexed  LL 

ESn 

mmm 

From  Table  1,  the  indexed  linked  list  provides  superior  time  complexity  for  both 
searching  and  traversing,  especially  when  N  >  M.  Therefore,  the  indexed  linked  list  data 
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Figure  7.  Data  Structure  for  Storing  Pool  Balls 


structure  was  selected  even  though  the  initial  application  uses  only  a  single  sector,  negating 
any  advantages  of  the  indexed  linked  list  over  the  simple  array.  The  data  structure  is  shown 
pictorially  in  Figure  7.  All  ball  objects  contain  a  BalLID  attribute  which  is  implemented 
with  unique  positive  integers.  The  array  clement  corresponding  to  zero  is  set  to  always 
point  to  the  tail  of  the  linked  list. 

4-2.3  Design  of  the  Table  Sector  Object 

Functional  Description:  The  table  sector’ object  is  an  instantiation  of  a  class  of  sectors. 
Even  though  the  initial  application  did  not  incorjrorate  sectoring,  it  was  desirable 
to  develop  data  structures  which  were  easily  transportable  to  the  more  complex 
applications  to  be  designed  later.  For  the  inital  software  version,  the  pool  table 
consisted  of  a  single  ‘sector’.  The  sector  object  defines  the  sector  boundaries.  Each 
sector  is  assigned  a  unique  Sector.ID  number. 

Attributes: 


•  SectorJD 

•  Left-Border 

•  Right-Border 

•  Top 

•  Bottom 

•  Type.Left.Border 

•  Type-Right-Border 
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•  Left-Neighbor 

•  RightjNeighbor 

•  Isieft 

•  Is_Right 

Operations:  None. 

Design  of  the  Table  Sector  Manager 

Functional  Description:  The  table  sector  manager  ohiect  store.s  all  of  the  sector  objects 
and  provides  information  about  the  sectors. 

Attributes: 


•  TableXength 

•  Table.Width 

Operations: 


•  Determinc.TableJDimensions 

•  Dcterinine-Sectors 

•  G’et-X.Length 

•  Get-Y.Length 

•  GetXeft.Border 

•  GetJlight-Border 

•  Get.Top 

•  Get-Bottom 

•  Get-Left-Type 

•  GetJlight.Type 

•  DoJ_IIave.a.Left'-Neighbbr 

•  DoJ-Have-aJUght-Neighbor 

•  Get-Left-Neighbor 

•  Get-Right-Neighbor 

•  Print-Sector 

•  PrintJVlLSectors 
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Detailed  Design:  Each  pool  ball  has  a  finite  radius  and  the  pool  table  has  a  finite  area; 
therefore,  for  any  given  pool  ball  radius  and  table  area,  there  is  an  upper  bound  on  the 
number  of  pool  balls  which  can  fit  on  the  table  without  overlap.  Three  methods  were 
considered  during  the  design  phase  to  determine  the  length  and  width  of  the  table. 


1.  command  line  argument 

2.  constant 

3.  dynamic  calculation 

The  first  option  was  deemed  too  awkward  to  work  with  from  a  user’s  point  of  view. 
The  second  option  was  favored  because  it  is  easy  to  implement  and  requires  no  compu¬ 
tations;  however,  recompiling  becomes  necessary  if  the  table  dimensions  are  too  small  to 
accommodate  a  desired  quantity  of  pool  ball  objects.  The  last  option  avoids  recompiling. 
Both  options  two  and  three  above  were  finally  selected  by  adding  a  single  command  line 
argument.  The  default  was  set  to  a  constant  value  for  length  and  width.  The  variable 
dimension  option  (if  selected)  dynamically  calculates  the  length  and  width  of  the  table  by 
using  a  constant  density  formula.  The  density  of  pool  balls  to  table  area  was  defined  by 
taking  the  ratio  of  a  the  area  occupied  by  16  pool  balls,  each  having  a  one-inch  radius, 
to  the  area  of  a  6  foot  by  12  foot  table.  The  table  length  was  defined  to  be  twice  the 
table  width;  therefore,  the  known  table  density  and  the  user  specified  quantity  of  pool 
balls  dynamically  determines  the  length  and  width  of  the  pool  table  if  the  variable  table 
dimension  option  is  selected  in  the  command  line  arguments. 

/,.2.5  Design  of  the  Random  Number  Generator  Object 

Functional  Description:  The  random  number  generator  object  was  borrowed  from  the 
works  of  Law  and  Kclton  (14).  This  object  is  a  machine  independent  pseudo-random 
number  generator  which  produces  a  stream  of  random  numbers  given  an  input  seed. 

Attributes:  .Seed. 

Operations: 


•  Uniform 

•  Exponential 

•  Normal 

•  Lognormal 
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J,.2.6  Design  of  the  EventHandler  Object 


Functional  Description:  The  EventHandler  oh  iect  determines  the  next  event  for  agiven 
sector  and  executes  an  event  which  is  passed  to  it. 

Attributes:  None. 

Operations: 


•  Initialize 

•  Detennine_Next_Evcnt 

•  Execute_Next_Event 

j.2.1  Event  Definitions  Fourevent  types  are  po.ssible  for  the  sequential,  non-partitioned 
version  of  the  pool  balls  simulation: 


1.  A  pool  ball  striking  a  vertical  cushion. 

2.  A  pool  ball  striking  a  horizontal  cushion. 

3.  A  pool  ball  striking  another  pool  ball. 

4.  A  ‘DONE’  event  indicating  that  the  simulation  is  over. 

The  ‘CUSHION’  events  were  identified  as  either  horizontal  or  vertical  since  the  be¬ 
havior  of  the  collision  differs  between  them.  A  ‘COLLISION'  event  consists  of  a  collision 
between  two  pool  balls  and  the  identification  of  both  ball  objects  is  stored  with  the  event 
information.  A  ‘DONE’  event  signifies  that  the  next  event  has  an  event  time  greater  than 
the  user  specified  simulation  time:  therefore,  the  execution  of  a  DONE  event  terminates 
the  simulation. 

,f.2.S  Algorithm  Design  Two  high  level  algorithms  were  considered  for  the  design 
of  the  pool  balls  simulation.  One  has  a  tim&and  .space  roinplcxitwof  0.(  jY).and  !9(jY)  while 
the  other  has  O(iY-)  and  0(1).  AFIT’s  iPSC/2  hypercube  has  d  megabytes  of  memory  per 
node;  therefore,  memory  space  was  not  considered  to  be  a  limiting  factor  for  reasonable 
quantities  of  pool  balls.  The  three  factors  that  were  considered  are  granularity,  research 
time  and  event  list  majiipulation.  The  iPSC/2  is  a  course  grain  machine.  DeCegamashows 
how  performance  on  distributed  processors  is  affected  by  the  granularity  of  the  software 


with  respect  to  the  machine’s  grain  size  (9).  He  concludes  that  a  fine  grain  algorithm  must 
be  implemented  on  a  fine  grain  machine  to  achieve  reasonable  speedup.  The  pool  balls 
simulation  is  inherently  fine  grain.  This  mis-match  in  granularity  was  anticipated  to  result 
in  poor  speedup  results.  This  raises  the  following  question:  if  speedup  results  are  poor, 
does  this  conclude  that  spatially  partitionable  models  should  not  be  implemented  with 
conservative  paradigms,  or  does  it  simply  reinforce  the  assertion  that  fine  grain  algorithms 
should  not  be  implemented  on  course  grain  machines?  The  latter  statement  fails  to  address 
any  of  the  objectives  of  this  thesis.  The  following  options  were  available: 

1.  Use  the  0{N)  algorithm  on  the  iPSC/2  knowing  that  the  mis-match  in  granularity 
e.xists. 

2.  Use  the  0(N)  algorithm  on  a.  fine  grain  machine. 

3.  Use  the  0{N)  algorithm  on  the  iPSC/2,  but  incorporate  spin  loops  to  artificially 
raise  the  computational  complexity  of  the  algorithm.  This  increases  the  computa¬ 
tions/communications  ratio  which  changes  the  granularity  of  the  algorithm  from  fine 
to  course. 

4.  Use  the  the  0(N‘)  algorithm  on  the  iPSC/2  which  also  artificially  raises  the  compu¬ 
tations/communications  ratio. 

The  first  option  was  dismissed  because  it  fails  to  address  the  objectives  of  this  thesis.  The 
second  option  was  not  possible  because  AFIT  has  only  course  grain  machines  (iPSG/1  and 
iPSC/2).  The  third  and  fourth  options  are  both  viable  and  both  were  analyzed  carefully. 

The  0{N)  algorithm  was  considered  to  be  more  difiicult  to  implement  and  therefore 
would  require  more  time  to  design,  imi)lement  and  debug.  The  0{N)  algorithm  requires 
efficient  storage  of  future  events  known  as  event  list  manipulation.  Misra  has  shown  that 
event  list  manipulation  is  the  limiting  factoi  toward  speedup  (IS).  The  0(A'-)  algorithm 
avoids  complex  event  list  manii)ulation  by  storing  only  one  event  at  a  time.  This  is  rel¬ 
atively  easy  to  implement.  In  the  final  analy.sis,  lime  uas  considered  the  limiting  factor; 
therefore,  option  four  was  selected. 

The  basic  algorithm  for  the  pool  balls  simulation  revolves  around  a  loop  construct. 
In  each  loop,  every  ball  is  analyzed  to  determine  which  ball  will  have  the  minimum  next 
event  time.  This  event  is  scheduled  by  inserting  it  into  the  next  event  queue;  thus,  the  next 
event  queue  is  refreshed  in  every  loop.  Simultaneous  events  are  both  possible  and  allowed 
in  which  ca.se  multiple  events  are  inserted  into  the  next  event  queue.  After  scheduling  the 
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next  event,  the  event  is  removed  from  the  queue  and  executed.  Execution  of  an  event 
consists  of  removing  the  appropriate  pool  balls(s)  from  the  ball  object  manager,  updat¬ 
ing  the  new  position,  calculating  a  new  velocity,  updating  the  pool  ball’s  time  stamp  and 
returning  the  pool  ball  to  the  ball  object  manager.  The  next  event  queue  is  checked  for 
additional  (simultaneous)  events.  After  the  next  event  queue  is  confirmed  to  be  empty, 
the  loop  starts  over  again.  The  simulation  ends  if  the  next  event  stored  in  the  next  event 
queue  is  a  DONE  event.  A  DONE  event  is  inserted  into  the  next  event  queue  if,  in  the 
determination  of  the  minimum  next  event,  the  next  event  time  corresponding  to  the  next 
event  is  greater  than  the  user  specified  simulation  time.  This  time  is  specified  as  a  com¬ 
mand  line  argument.  The  algorithm  may  be  summarized  by  the  following  loop. 

While  (i  Done)  loop 

1.  Determine  the  next  event(s). 

2.  Schedule  the  next  event(s). 

3.  while  (!  empty)  loop 

if  (Type  DONE) 

Execute  the  next  event 
else 

Done  =  TRUE 

End  loop 

The  basic  algorithm  used  in  this  simulation  has  a  time  complexity  of  C>(A-)  and  a 
space  complexity  of  0(1).  The  time  complexity  stems  from  the  fact  that  each  pool  ball 
must  inspect  every  other  pool  ball  on  the  table  to  determine  if  and  when  a  collision  will 
occur.  Thus,  the  first  ball  must  inspect  iV  -  1  balls,  the  second  ball  must  inspect  lY  -  2 
j)ool  balls  and  the  N  —  1‘*  ball  must  inspect  1  ball.  Thus,  the  time  complexity  is: 


A.  iV^-l-A' 

Time  Complexity  =  0{N^) 

The  0{N)  algorithm,  although  not  implemented,  works  as  follows.  During  the  first 
loop  iteration,  each  ball  will  have  inspected  every  other  ball.  At  most  N  events  will  be 
pheasible,  one  for  each  ball  and  each  with  different  event  times.  These  events  will  be 
stored  in  memory.  After  executing  an  event  involving  ball  B,  all  events  in  the  event  list 
containing  ball  B  must  be  removed.  Ball  B  must  then  inspect  all  remaining  (A'^  -  1)  balls 
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to  regenerate  the  previously  removed  events.  If  the  event  list  data  structure  is  designed 
efficiently,  at  most  0{N)  time  Is  required  to  remove  events  involving  ball  .F  and  0{N) 
time  is  required  to  regenerate  the  new  events  involving  ball  B.  Thus,  the  overall  time 
complexity  is  0{N)  instead  of  O(N-) 

Jf.2.9  Design  of  the  Queue  Structures  This  thesis  defines  the  scheduling  of  an  event 
to  be  the  insertion  of  an  event  into  the  next  event  queue.  No  other  events  are  inserted  into 
the  next  event  queue  unless  they  occur  at  the  same  simulation  time.  As  each  pool  ball 
is  inspected  during  the  determination  of  the  next  event,  the  current  minimum  next  event 
is  temporarily  stored.  A  simple  event  structure  does  not  suffice  because  the  possibility 
exists  of  finding  an  event  whose  next  event  time  is  equal  to  the  current  minimum.  Such  an 
event  cannot  be  discarded  nor  can  it  replace  the  current  mininnim  next  event.  Therefore, 
a  candidate  queue  was  designed  to  store  the  current  next  event.  After  all  of  the  pool 
balls  have  been  inspected  for  next  events,  the  candidate  queue  guarantees  to  contain  the 
absolute  minimum  next  event.  This  event  is  then  ‘scheduled’  by  placing  it  on  the  next 
event  queue. 

Version  I  Structure  Chart  The  structure  chart  for  the  non-parti tioned  se¬ 
quential  simulation  is  shown  as  Figure  S. 

4.2.11  Command  Line  Arginnents  The  pool  balls  simulation  design  incorporates 
ten  command  line  arguments  all  of  which  are  optional  to  the  user.  Each  argument  has 
a  default  value  in  the  event  that  an  option  is  not  selected.  The  arguments  available  are 
listed  in  Table  2. 

Write  to  Disk:  This  option  writes  the  ball  state  information  to  disk  after  each  poof  ball 
changes  state.  This  output  file  allows  the  user  to  inspect  the  data  and  to  compare 
data.  runs.  This  option  degrades  the  run  lime  performance  of  the  simulation  not  only 
due  to  slow  disk  I/O  but  also  because  the  host  processor  performs  all  of  the  writing 
(0  disk.  .\s  each  pool  ball  is  moved,  the  node  processor  send.‘  the  d;  'a  to  the  iPSC/2 
host  processor  for  writing.  Thus,  a  heavy  penalty  is  extracted  for  this  command  line 
argument. 

Error  Checking:  This  option  examines  each  pool  ball  prior  to  being  moved  and  was 
intended  for  debugging  purposes  only.  A  pool  ball  must  lie  within  the  borders  of 
the  sector  to  which  it  is  assigned  and  the  time  lag  must  be  both  positive  and  less 
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Table  2.  Command  Line  Arguments 


Argument 

Function 

Default 

-w 

write  to  disk 

FALSE 

-e 

FALSE 

-cp 

Continuously  print 
to  screen 

FALSE 

-ip 

initially  print 
to  screen 

FALSE 

— 

number  of  balls 

25 

number  of  partitions 

1 

-f 

fixed  table  dimensions 

TRUE 

-t# 

simulation  time 

60.0  sec 

-S 

collect  statistics 

FALSE 

-ne# 

number  of  events 

50 

than  the  event  time  associated  with  the  requested  event.  If  any  of  these  items  are 
incorrect,  the  event  is  not  executed,  an  error  message  prints  to  the  screen  and  the 
simulation  abruptly  terminates. 

Continuously  Print  to  Screen:  This  option  prints  all  of  the  pool  ball  state  information 
to  the  screen  each  time  a  pool  ball  is  moved.  This  allows  real-time  examination  of 
the  simulation;  however,  a  hefty  penalty  is  placed  on  run  time  performance. 

Initially  Print  to  Screen:  This  option  prints  the  initial  state  information  for  all  of  the 
pool  balls  after  initialization.  There  is  no  penalty  in  run  time  performance  because 
the  real-time  clock  is  not  started  until  after  initialization. 

Number  of  Balls:  This  option  specifies  the  number  of  pool  balls  desired  for  the  simula¬ 
tion. 

Number  of  Partitions:  This  option  specifies  the  number  of  partitions  desired  for  the 
simulation. 

Fixed  Table  Dimensions:  This  option  determines  whether  the  table  length  and  width 
are  set  by  a  pro-defined  constant  in  the  software  or  if  a  dynamically  calculated  table 
length  and  width  are  to  be  used  based  upon  a  pre-defined  density  constant  and  the 
number  of  pool  balls  selected. 

Simulation  Time:  This  option  specifies  the  simulation  time. 


dl 


Collect  Statistics:  This  option  writes  the  simulation  run  time  as  a  function  of  the  num¬ 
ber  of  pool  balls  to  a  file.  This  file  can  then  be  used  to  plot  the  results  of  many  test 
runs  without  manual  data  entry. 

Number  of  Events:  This  option  allows  an  alternate  technique  to  be  used  to  terminate 
the  simulation.  Each  node  of  the  iPSC/2  keeps  track  of  the  number  of  events  that 
have  been  processed.  When  using  a  single  node,  the  simulation  may  be  set  to  ter¬ 
minate  after  processing  a  specified  number  of  events.  This  option  is  not  valid  when 
using  multiple  nodes  for  the  parallel  version. 
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Figure  8.  Version  1  Structure  Chart 
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4-3  Design  of  a  Spatially  Partitioned  Sequential  Simtilaiion 

The  basic  design  of  the  second  simulation  version  was  the  same  as  the  first  version; 
however,  additionally  functionality  had  to  be  incorporated  and  some  of  the  data  structures 
had  to  be  modified.  The  high  level  algorithm  was  unchanged. 

4-3.1  Changes  to  the  Ball  Object  The  previous  simulation  design  incorporated  a 
single  indexed  linked  list  for  the  ball  object  mangager  data  structure.  The  partitioned 
version  requires  an  indexed  linked  list  per  sector.  To  accommodate  this  change,  each 
indexed  linked  list  is  encapsulated  into  a  record  structure;  thus,  each  sector  has  one  record 
type.  An  array  of  sectors  of  length  P  was  designed  where  P  equals  the  number  of 
sectors.  Each  array  element  contained  a  pointer  to  the  appropriate  record  type  containing 
the  indexed  linked  list. 

4.3.2  Changes  to  the  Table  Object  The  previous  sequential  simulation  called  for 
a  simple  record  structure  storing  the  table’s  boundary  information;  however,  there  was 
only  one  sector  to  contend  with.  The  design  of  the  spatially  partitioned  version  of  the 
sequential  simulation  incorporates  an  array  of  sectors  of  size  P  where  P  equals  the 
number  of  sectors.  Each  array  element  points  to  a  record  structure  containing  individual 
sector  boundaries.  The  sectors  were  designed  to  be  equal  in  length  and  width.  The  table 
is  partitioned  vertically  along  the  X  axis.  The  table  may  be  partitioned  into  P  <  P„,ax 
where  P,nax  is  the  number  of  sectors  corresponding  to  a  sector  width  greater  than  the 
predefined  pool  ball  diameter. 

4.3.3  Changes  to  Event  Definitions  In  the  pievious  non-partitioned  version,  it  was 
possible  for  a  pool  ball  to  traverse  the  entire  pool  table  along  the  X-axis  in  one  step.  The 
partitioned  version  requires  two  or  more  incremental  steps  depending  upon  the  number  of 
sectors  requested.  A  pool  ball  must  now  stop  at  a  sector  border  whereupon  data  replication 
may  take  place,  followed  by  a  crossing  from  one  sector  to  the  next  sector.  The  former  event 
is  defined  to  be  a  ‘PARTITION’  event  while  the  latter  event  is  defined  to  be  an  ‘EXIT’ 
event.  Control  of  a  pool  ball  normally  takes  place  during  an  EXIT  event.  As  the  number 
of  partitions  increase,  the  number  of  incremental  events  increases  for  pool  balls  moving  in 
the  X  direction. 

4.3.4  Changes  to  the  Simulation  Algorithm  The  algorithm  was  changed  in  the  fol¬ 
lowing  manner: 
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1.  ‘Partition’  and  ‘exit’  events  were  added. 

2.  Determination  of  the  next  event  was  modified. 

3.  Enforcement  of  VVieland’s  data  replication  was  added. 

d.  Design  of  the  ball  object  data  structure  was  modified  to  allow  each  sector  to  have  its 
own  ball  manager. 

5.  Design  of  the  table  data  structure  was  modified  to  allow  storage  of  individual  sector 
information. 

The  basic  algorithm  now  consists  of  the  following  steps: 

1.  Determine  the  next  event. 

2.  Schedule  the  next  event. 

3.  Execute  the  next  event. 

4.  Enforce  Data  Replication  (as  needed). 

To  determine  the  next  event,  each  sector  is  inspected  one  at  a  time.  There  is  one 
global  candidate  next  event  queue  and  one  global  next  event  queue.  As  each  sector  is 
inspected,  the  individual  e\  mts  are  compared  against  the  global  candidate  next  event 
queue.  The  time  to  determine  the  next  event  requires  searching  P  sectors,  each  containing 
on  average  ^  poo!  balls;  thus,  the  time  complexity  is  reduced  from  0{N-)  to  0 
where  N  is  the  total  number  of  pool  balls  and  P  is  the  number  of  sectors. 

Scheduling  the  next  event  remains  unchanged.  After  completing  the  next  event 
determination  phase,  the  single  candidate  list  guarantees  to  contain  the  minimum  next 
event  for  all  sectors  (the  minimum  next  event  for  the  entire  pool  table).  Scheduling  that 
event  consists  of  removing  the  candidate  event  from  the  candidate  queue  and  inserting  it 
into  the  next  event  queue. 

Executing  the  event  requires  knowledge  of  the  sector  from  which  the  next  event  orig¬ 
inated.  All  of  the  procedures  in  the  simulation  were  thus  changed  to  allow  this  information 
to  be  passed  cis  in- type  parameters.  The  sector  identified  as  enforcing  the  collision  retrieves 
the  ball(s)  from  its  ball  object  manager,  updates  the  position(s),  calculates  and  stores  the 
new  velocity  trajectories,  updates  the  ball  time  tag(s)  and  returns  the  pool  ball(s)  to  the 
sector’s  ball  object  manager. 
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This  simulation  used  a  two  step  data  replication  strategy  proposed  by  Wieland  (22). 
The  replication  rules  were  established  in  a  communicator  object.  Ball  objects  must  be 
replicated  or  de-replicated  under  the  following  conditions: 

1.  Providing  Visibility:  A  sector  must  be  given  visibility  of  a  pool  ball  if  the  ball 
was  previously  owned  by  another  sector  but  upon  moving,  the  ball  now  lies  within 
the  border  region.  The  center  of  the  ball  must  still  lie  within  the  adjacent  sector; 
otherwise,  the  gaining  sector  not  only  has  visibility  of  the  ball  but  also  hcis  control 
of  the  ball. 

2.  Providing  Ownership  (Control):  A  sector  must  be  given  control  of  a  pool  ball  if  the 
ball  was  previously  owned  by  another  sector  but  upon  moving,  the  ball  now  has  its 
center  of  mass  within  the  gaining  sector’s  boundaries. 

3.  Removing  Visibility:  A  sector  must  remove  a  ball  from  its  ball  manager  object  if  the 
ball  was  previously  visible  but  upon  moving,  the  ball’s  center  of  mass  lies  outside 
of  the  losing  sector’s  border  region.  This  implies  that  the  ball  is  owned  by  another 
sector  after  moving. 

4.  Removing  Ownership  (Control):  A  sector  must  relinquish  control  of  a  pool  ball  if  the 
ball  was  previously  owned  by  the  losing  sector  but  upon  moving,  the  ball’s  center  of 
mass  now  lies  beyond  the  losing  sector’s  boundaries.  A  ball  meeting  this  condition 
cannot  move  any  further  in  one  step  than.the  edge  of  the  losing  sector’s  border  region. 
In  this  manner,  control  of  the  ball  is  pas.scd  to  the  gaining  sector  but  the  losing  sector 
retains  visibility. 

5.  Updating  Visibility:  A  sector  must  have  an  updated  copy  of  a  pool  ball  if  the  ball 
was  previously  visibile  (but  not  owned)  by  an  adjacent  sector  and  upon  moving,  the 
ball  is  still  visible  (and  still  not  owned)  by  the  adjacent  sector. 

Each  pool  ball  object  has  associated  with  it  a  visibility  flag  and  an  ownership  flag.  All 
pool  balls  owned  by  a  sector  must  also  be  visible  to  the  sector  or  an  error  condition  is  raised. 
The  act  of  providing  visibility  of  a  pool  ball  to  an  adjacent  sector  consists  of  providing 
a  copy  of  the  pool  ball  object  to  the  adjacent  sector’s  ball  object  manager.  The  only 
difference  between  the  two  copies  is  the  status  of  the  ownership  flag.  Removing  visibility 
consists  of  requesting  an  adjacent  sector’s  ball  object  manager  to  delete  the  replicated 
ball.  Changing  of  ownership  status  is  analogous.  Since  this  version  of  the  simulation 
is  still  implemented  on  a  single  processor,  all  commands  may  be  implemented  directly 
through  memory.  Messages  are  not  required. 
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4-4  Design  of  a  Parallel  Simulation 

The  basic  design  of  the  parallel  simulation  was  the  same  as  the  sequential  simu¬ 
lation  with  spatial  partitioning  and  data  replication.  Some  additional  functionality  had 
to  be  added  to  enforce  the  conservative  synchronization  paradigm  and  some  of  the  data 
structures  had  to  be  changed  to  accommodate  the  distributed  environment. 

4-4-i  Changes  to  the  Ball  Object  The  partitionable  sequential  design  encapsulated 
a  ball  object  manager  for  each  sector.  An  array  of  length  P  contained  pointers  to  each 
sector’s  ball  manager.  For  the  distributed  simulation,  no  one  node  will  ever  contain  all 
of  the  table  sectors;  therefore,  the  data  structure  was  changed  as  follows.  Each  node  has 
one  data  structure  consisting  of  an  array  of  pointers  representing  a  list  of  ball  managers. 
The  length  of  each  node’s  array  is  ^  where  P  is  the  number  of  table  sectors  and  M  is  the 
number  of  nodes.  This  ratio  represents  the  number  of  partitions  per  node  which  is  defined 
to  be  the  same  for  all  nodes.  Therefore,. a  design  constraint  limits  the  number  of  sectors 
to  be  an  even  nrultiple  of  the  number  of  nodes. 

4.4.2  Changes  to  the  Table  Object  The  partitionable  sequential  design  encapsulated 
each  sector’s  boundary  information  in  an  array  of  length  P.  The  distributed  design  assigns 
one  array  of  sector  information  to  each  node  and  the  length  of  each  array  is  reduced  from 
Pio£. 

4.4.3  Changes  to  the  Candidate  Queue  Structure  Both  sequential  simulation  de¬ 
signs  encapsulated  a  single  candidate  next  event  queue  for  the  entire  pool  table.  The 
distributed  design  encapsulates  a  candidate  next  event  queue  for  each  sector.  Each  node  is 
allocated  an  array  of  candidate  queues  representing  the  hierarchical  class  of  queues.  The 
length  of  each  array  is  fj.  This  design  decision  is  important  because  a  candidate  event  is 
no  longer  scheduled  based  solely  upon  the  criterion  that  it  has  the  smallest  next  event  time. 
For  the  distributed  simulation,  it  is  highly  desirable  to  schedule  as  many  sectors  as  possible 
to  achieve  efficient  parallelism.  Obviously  if  more  than  one  sector  is  scheduled  to  execute 
a  candidate  event,  one  event  has  the  smallest  next  event  time  while  all  other  scheduled 
events  have  a  greater  next  event  time.  The  determination  of  scheduling  candidate  events 
must  now  reside  with  determining  a  minimum  safe  time  per  sector.  This  is  discussed  later. 

4.4.4  Changes  to  the  Next  Event  Queue  Structure  Both  sequential  simulation  de¬ 
signs  allocated  a  single  next  event  queue  defined  by  the  Institute.  The  distributed  design 
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must  allocate  one  next  event  queue  per  node  to  reduce  internode  communications.  This  is 
discussed  further  in  Chapter  V. 

4.4.5  Changes  to  the  Simulation  Algointhm  The  basic  high  level  algorithm  was 
changed  to  incorporate  the  minimum  safe  time  calculation.  The  formulation  of  a  mini¬ 
mum  safe  time  (MST)  is  explained  in  detail  in  Chapter  V.  The  high  level  algorithm  for 
the  parallel  application  consists  of  the  following  steps: 


While  (!  Done)  loop 

1.  For  each  node  in  parallel,  determine  the  candidate  next  event. 

2.  For  each  node  in  lockstep,  determine  the  minimum  safe  time. 

3.  For  each  node  in  parallel,  schedule  a  candidate  event  if 
and  only  if  a  sector’s  candidate  next  event  time  is 

less  than  or  equal  to  the  sector’s  calculated  minimum 
safe  time. 

4.  While  (!  Empty)  loop 

If  (Type  5^  DONE) 

For  each  node  in  parallel,  execute  the  next  event, 
else 

Done  =  TRUE 

5.  For  each  node  in  lockstep,  enforce  Wleland’s  data 
replication  strategy. 

End  Loop 


The  above  algorithm  is  repeated  in  a  loop  until  the  next  evejit  time  is  greater  than 
the  user  specified  simulation  lime.  The  loop  thus  created  is  performed  in  lock  step  syn¬ 
chronizing  at  exactly  two  points.  Steps  2  and  5  must  be  in  lock  step  and  are  performed 
sequentially.  Parallelism  may  be  achieved  during  slej)s  1,3  and  5.  For  the  case  of  step  1, 
not  only  is  parallelism  achieved,  but  the  search  space  per  sector  is  reduced  from  0(J\'*)  (o 
O(^).  The  gains  produced  from  the  decreased  search  time  and  parallelism  are  reduced 
by  the  increased  number  of  incremental  events  created  l)y  PARTITION  and  EXIT  event 
types.  These  events  are  required  for  simulation  correctness  but  are  not  of  real  interest  in 
the  simulation.  Each  additional  sector  adds  two  more  incremental  steps  for  a  pool  ball 
to  traverse  the  table  in  the  X-axis  direction.  This  increase  is  not  linear  because  a  multi- 
partitioned  table  can  result  in  more  than  one  candidate  event  meeting  its  minimum  safe 
time.  Thus,  one  search  can  yield  multiple  event  executions. 

The  software  design  of  the  parallel  pool  balls  simulation  is  represented  by  the  leveled 
dataflow  diagrams  of  Figures  9, 10  and  11.  The  process  bubbles  having  multiple,  ovei  laycd 
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bubbles  represent  processes  that  execute  in  parallel.  There  is  no  current  standard  for 
parallel  data  flow  diagrams. 
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Pigure  11.  Level  2  Data  Flow  Diagram  for  Process  4.0 


52 


V.  Parallel  Simulation  Design  and  Implementation 


5.1  Introduction 

This  chapter  discusses  the  design  of  the  pool  balls  simulation  model  and  the  algorithm 
used  to  enforce  distributed  simulation  synchronization  in  a  conservative  environment.  Al¬ 
ternative  algorithms  are  discussed  and  their  respective  advantages  and  disadvantages  are 
analyzed. 

5.2  Design  of  a  Parallel  Simulation  Model 

Design  of  a  distributed  discrete  event  simulation  requires  selection  of  an  appropriate 
model.  Schruben’s  concepts  of  transient  and  resident  models  were  analyzed  for  considera¬ 
tion  (20).  The  following  analysis  e.\plains  why  the  resident  entity  model  was  .selected. 

5.2.1  Modeling  Pool  Balls  as  Transient  Entities  With  this  model,  each  pool  ball  is 
an  entity  and  parallelism  is  possible  by  distributing  the  pool  balls  to  varying  nodes  of  a 
distributed  processor.  In  order  for  a  node  to  determine  if  a  collision  will  occur  between 
one  of  its  ball  objects  and  another  ball  object,  the  node  must  know  of  the  existence  of  the 
other  ball.  This  could  be  done  in  any  of  several  manners. 

A  simple  approach  is  to  appoint  a  central  manager.  This  jnanager  has  the  stale  in¬ 
formation  concerning  every  ball.  Nodes  that  need  to  gain  access  to  ball  stale  information 
request  the  information  from  the  central  manager.  If  there  are  M  nodes  and  A' pool  balls, 
this  approach  would  require  each  node  to  cyclically  communicate  N  messages.  With  M 
nodes,  the  time  complexity  for  communications  is  C>(Anog  P).  This  approach  requires 
each  node  to  wait  upon  the  central  manager  thus  forming  a  bottleneck.  The  run  time 
performance  is  then  reduced  to  the  speed  of  the  central  manager,  thereby  driving  the  sim¬ 
ulation  to  approach  sequential  performance  (3).  In  terms  of  search  time,  each  node  would 
require  a  complete  search  of  all  .N  pool  balls.  The  best  possible  algorithm  incor|)oraling 
all  A'^pool  balls  is  0{N).  The  algorithm  used  in  this  thesis  would  require  0(A'*)  lime. 

An  alternative  approach  to  modeling  the  pool  balls  as  transient  entities  assigns  a 
unique  subset  of  A^  pool  balls  to  each  node  such  that  5,-  n  S)  =  0,  5,-  U  .5)  =  S 
and  5,,  Sj  C  S  where  5  is  the  set  of  A' pool  balls.  Each  node  is  provided  a  mapping  of 
pool  balls  to  nodes.  A  table  look-up  function  provides  each  node  with  the  capability  of 
requesting  ball  state  information  directly  from  the  appropriate  owner.  If  the  A' pool  balls 
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are  evenly  divided  between  M  nodes,  then  each  node  must  cyclically  request  ball  state 
information  from  all  other  nodes.  Therefore,  each  node  communicates  M-1  times  to  the 
remaining  M-1  nodes  and  each  node  must  provide  ^  pool  balls.  If  every  node  requires  the 
same  quantity  of  messages,  the  time  complexity  for  this  model  is  0(Nlog  M).  While  the 
time  complexity  appears  to  be  the  same  between  this  approach  and  the  central  manager 
approach,  this  technique  is  actually  superior  because  there  are  no  bottlenecks  to  form. 
Each  node  would  still  have  0(N)  search  space  for  which  to  determine  a  candidate  next 
event. 

A  third  approach  assigns  a  copy  of  all  N pool  balls  to  each  node.  Each  node  maintains 
two  lists  such  that  S,  is  a  set  of  pool  balls  owned  by  Jiode,  and  Sj  is  a  set  of  pool  balls 
not  owned  by  node,  where  S,-,  Sj  C  N  and  (5,-  n  Sj  =  0)  A  (5,-  U  Sj  =  5),  5  being 
the  set  of  all  iV  pool  balls.  As  each  node  changes  the  ball  state  information  of  a  pool  ball 
in  the  set  5,,  it  must  broadcast  the  state  change  to  all  other  nodes  maintaining  a  copy  of 
ball,  E  Sj.  Given  M  nodes,  each  event  execution  requires  M  messages  communicated  in 
0(log  M)  time.  If  every  node  can  process  simultaneously,  then  the  communications  time 
approaches  0(M  log  M).  Since  every  node  maintains  the  set  S  of  all  pool  balls,  the  search 
time  is  at  best  0(N).  Bottlenecks  do  not  occur.  This  approach  was  considered  for  the 
Sharks  World  simulation  but  rejected  during  the  design  phase  (8). 

5.2.2  Modeling  Ike  Pool  Table  as  Multiple  Resident  Entities  With  this  model,  the 
pool  table  is  partitioned  into  multiple  slices.  The  slices  can  be  of  any  shape  and  can  be  one 
or  two  dimensional.  The  following  discussion  considers  only  one  dimensional  partitioning, 
such  as  slicing  the  pool  table  along  the  X  or  Y  axis. 

If  the  pool  balls  are  uniformly  distributed,  each  node  will  have  approximately  ^ 
pool  balls.  If  V  7,  node,  has  every  pool  ball  lying  within  a  border  region,  then  using 
Wieland's  data  replication  strategy  pool  balls  must  be  replicated  between  adjacent 
nodes  via  message  pa.ssing.  Given  M  nodes,  the  communications  time  complexity  worst 
case  is  0[N):  however,  this  depends  upon  the  predicate  that  each  node,  has  all  pool  balls 
lying  within  border  regions.  Furthermore,  each  node  need  only  communicate  ^  pool  balls 
once  during  initialization.  After  initialization,  at  most  two  pool  balls  per  ector  can  move 
(representing  a  collision  event).  If  each  sector  can  execute  an  event  simultaneously,,  then 
at  most  0(M)  messages  must  be  sent.  This  is  based  upon  the  constraint  that  nodes  need 
only  communicate  with  their  nearest  neighbor.  This  time  complexity  is  further  reduced 
based  upon  the  probability  that  a  sector  will  have  all  of  its  ^  pool  balls  lying  within  a 
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border  region.  Therefore,  the  communications  time  is  better  than  0(M).  If  each  node  is 
partitioned  into  multiple  sectors  then  each  sector  has  approximately  ^  pool  balls  where 
P  is  the  number  of  sectors  for  the  entire  pool  table.  If  each  sector,  has  every  pool  ball 
lying  within  a  border  region,  and  if  sectors  assigned  to  the  same  node  are  adjacent  to  one 
another  and  communicate  via  memory  at  memory  speeds,  then  the  initial  communications 
time  reduces  to  Oijp-M).  Since  P  >  M,  the  communications  time  is  less  than  0{N)  with 
or  without  parallelism.  If  P  >  M,  the  communications  approaches  constant  time.  Adding 
potential  parallelism  in  communications  and  the  probability  of  having  nominal  percentages 
of  pool  balls  lying  within  border  regions  results  in  near  constant  communications  time  even 
for  P  >  M.  In  terms  of  search  time,  the  time  complexity  reduces  from  0{N)  best  case  to 
0{^),  P  >  M  and  reduces  to  O(^)  using  the  O(A-)  algorithm  presented  in  Chapter  IV. 
Therefore,  the  resident  entity  model  is  superior  to  the  transient  entity  model. 

5.5  Developing  the  Minimum  Safe  Time  Calculation 

A  conservative  synchronization  paradigm  requires  a  logical  process  to  postpone  the 
execution  of  an  event  if  there  is  a  possibility  of  receiving  an  out-of-sequence  message.  Each 
out-of-sequence  message  has  an  associated  time  stamp.  This  chapter  will  show  that  it  is  not 
possible  to  exactly  calculate  the  value  of  the  time  stamp  for  an  out-of-sequence  message; 
however,  it  is  possible  to  estimate  it.  An  estimator  is  shown  to  be  valid  if  it  guarantees 
to  be  less  than  or  equal  to  the  time  of  arrival  of  the  first  transient  ball  message.  Bounds 
are  placed  on  the  estimate  from  which  an  estimator  is  proven  to  be  valid.  Simulation 
progress  using  the  estimator  is  also  proved.  This  chapter  develops  and  presents  three 
unique  estimators  all  of  which  are  valid;  however,  only  two  of  them  guarantee  progress. 
These  two  estimators  are  analyzed  and  compared.  Each  estimator  has  advantages  and 
disadvantages.  This  chapter  will  state  which  of  the  estimators  was  selected  for  design 
implementation  and  why. 


Definition  5.1:  A  transient  ball  message  is  a  message  containing  a  pool  ball  and  all  of 
its  assiciated  state  information  sent  from  one  sector  to  another  as  a  result  of  the  pool  ball 
crossing  a  sector  border.  Let  u  be  the  time  of  a  transient  ball  message,  i/  be  a  discrete  time 
interval  representing  the  simulation  iteration  number,  and  i  be  the  sector  which  receives 
it;  then,  ?:,(»/)  is  the  time  of  arrival  at  sector  i  of  the  next  chronological  transient  ball 
message.  If  the  simulation  time  of  sector  i  is  w,,  then  the  condition  «,(//)  <  w,  defines 
the  transient  ball  message  corresponding  to  k,(//)  to  be  an  out-of-sequence  mo.ssage. 


Definition  5.2:  The  set  of  event  types  for  the  pool  balls  simulation  consists  of  five  types; 
partition,  exit,  collision,  vertical  cushion,  and  horizontal  cushion  events.  Let  the  set  of 
event  types  be  denoted  by  5;  then 

S  =  {PART,  EXIT,  COLL,  VERT,  IIOR)  (21) 


Definition-5.3:  An  event  E  in  sector  i  is  defined  as  the  tuple 

Ei  =  (ti{ei,u),B)  (22) 


where 


•  e,  is  an  event  type  e,-  6  S 

•  1/  is  a  discrete  time  denoting  the  simulation  iteration  number, 

•  tf(ei,  is  the  time  of  the  event  £,-,-and 

•  B  IS  the  set  of  pool  balls  associated  with  the  event  Ej 


Definition  5.4:  The  Minivmim  Safe  Time  for  sector  i  is  an  estimate  of  the  time  of  the  next 
transient  ball  message  to  be  received  by  i.  This  estimate  is  denoted  MST,{if).  The  three 
estimates  developed  by  this  thesis  are  denoted  {MSTi{u),  MST?{u),  and  MSTf{u)). 


Lemma  5.1:  The  minimum  safe  time  in  sector  i  must  be  greater  than  zero  and  less  than 
or  equal  to  the  time  of  the  first  transient  ball  message  received  by  sector  i  to  be  a  x'alid 
estimate  of  Tliis  is  stated  mathematically  as 

0  <  MSTi{u)  <Vi{u)  (2.3) 

Proof:  Due  to  monotonicity  of  events,  the  lime  of  arrival  of  the  first  transient  ball  message 
must  be  greater  than  zero.  By  definition  of  conservative,  a  logical  process  can  only  c.xecutc 
an  event  if  the  event  time  is  less  than  or  equal  to  the  time  of  arrival  of  the  next  transient  ball 
mc.ssage.  Suppose  that  the  next  event  time  t,ie„v)  were  greater  than  «,(*/);  then.  e.xcculion 
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of  Et  would  not  bo  allowed.  Suppose  also  that  an  estimate  of  u,(^u)  were  calculated.  Let 
this  estimate  be  MST,{i^)  such  that  MS7\{i/)  >  «,((/).  If  the  condition  existed  such  that 
ii(Ci,  '•')  ^  y  in  the  e.\ecution  of  E„  then  the  conservative  property  would 

be  violated  and  simulation  correctness  no  longer  guaranteed.  Therefore,  an  estimator  of 
■Ui{u)  must  be  i  jss  than  or  equal  to  «,-(//). 


Definition  5.5:  P  is  defined  as  the  predicate  of  e.xecnting  an  event  Ei  =  P) 

such  that 


^  TRUE  if /..-(e,-,//)  <  MSTi{u) 
P(Et)  ~{  «v  -  .V  ; 

F ALS  E  otherwise 


Definition  5.6:  At  any  instant  in  time,  sector  /  has  a  .set  of  pool  balls  in  i.  Denote  this 
set  of  pool  balls  Gi{u)]  then 

(25) 

where  bj  is  a  pool  ball  with  ball  identification  number  j.  All  values  of  j  are  unique. 


Definition  5.7:  By  design,  every  sector  in  the  simulation  has  the  same  dimensions.  Every 
pool  ball  has  a  minimum  lime  to  cross  a  .sector  based  upon  itsX-a,\is  velocity  vector.  Then 


W 


(26) 


where  e,  is  an  event  type  in  5,  u  is  an  iteration  number,  IK  is  the  width  of  a  sector,  B  is 
a  pool  ball  in  C\  corresponding  to  the  event  E,  =  (l,(ci,i/),  B),  and  Vx{B)  is  the  X-axis 
velocity  of  the  pool  ball  in  Ei- 


Leninia  5.2:  The  time  of  arrival  at  .sector  /  of  the  first  irainsient  ball  message,  «,(/''),  is 
greater  than  or  equal  to 

«,-(;/)  >  min  {/.,±,.(e,-±,.  =  PART,  //)  -I-  (n  -  1)  +  TTCi^^^ca,,  =  PART,  //)}  (27) 
V?!. :  71  ^  0,  i  -  n  >  0,  i  A-  n  <  P 

where  i  G  (0,1, 2, ..  .,{P  —  1)}  and  P  is  the  number  of  sectors  specified  by  the  user. 
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Proof:  Given  the  set  of  event  types  5,  a  ball  cannot  arbitrarily  enter  sector  i  from  sector 
j  without  first  being  scheduled  for  a  partition  event  in  j.  This  premise  is  based  on  the 
defijiition  of  Wieland’s  two  step  sectoring  strategy.  A  pool  ball  in  a  sector  adjacent  to 
sector  i  produces  a  transient  ball  message  to  sector  f  as  a  direct  consequence  of  a  partition 
event  £'(,±i)  =  (t(i±i)(e,  =  PART^u),B).  A  pool  ball  in  sectors  (f  +  2)-or  (i-2)  must  have 
a  partition  event  in  order  to  reach  sectors  (i  +  1)  or  (i  -  1)  respectively  followed  by  the 
unobstructed  traversal  of  sectors  (i  +  1)  or  (i  -  1)  to  reach  sector  i.  Similarly,  a  pool  ball 
in  sector  (i  +  n)  or  (i  -  n)  must  have  a  partition  event  in  sector  (i  +  7i)  or  (i  -  n)  followed 
by  traversing  (71  —  1)  sectors  to  roach  sector  i.  This  results  in  equation  27  of  Lemma  5.2. 


Lemma  5.3:  The  time  of  the  earliest  transient  ball  message  which  can  be  received  by 
sector  i,  7/, •(;/),  is  not  solvable  for  the  pool  balls  simulation. 

Proof:  If  the  next  event  Ei  is  not  a  partition  event,  then 4,(6, •,//)  <  /,(€,•  =  PART,!/') 
where  u'  >  u  by  definition  of  monotonicity  of  events.  The  ball  in  sector  ?  which  has 
the  event  E,  is  in  the  set  of  pool  balls  The  set  of  pool  balls  G,{t/)  cannot  be 

propagated  in  time  to  calculate  Ei  =  {ii{ei  =  PART,v'),B)  because  the  set  of  pool 
balls  G,{v')  cannot  be  determined.  The  set  G,{v')  cannot  be  determined  because  during 
the  time  interval  corresponding  to  {lA  -  7/),  additional  pool  balls  may  arrive  in  sector  i 
from  sectors  (7  ±  1).  By  definition  of  Wieland’s  sectoring  strategy,  sector  ?  knows  only 
of  the  e.xistence  of  pool  balls  in  /  and  not  of  any  other  sector;  therefore,  it  is  impossible 
for  sector  i  to  predict  the  arrival  of  additional  pool  balls  in  the  interval  (//'  -  u).  Since 
this  is  true,  it  is  impossible  for  sector  i  to  predict  the  event  E,  =  =  PART,u'),B). 

Without  knowing  E,,  the  values  /,(e,  =  PART,i/')  cannot  be  determined  to  solve  equation 
27.  Without  B  in  E,,  it  is  impossible  to  determine  Vx{B)  and  therefore  it  is  impossible 
to  determine  TTC,{e,  =  PART,u')  from  Definition  5.7.  This  is  true  not  only  for  sector  ? 
but  for  all  .sectors  in  {0,1,2...};  therefore,  7t,(//)  is  not  solvable. 


Definition  5.8:  For  any  sector  i  and  any  iteration  u,  the  set  of  pool  balls  G,{u)  has  a 
pool  ball  whose  X-axis  velocity  is  greater  than  or  equal  to  all  of  the  other  balls  in  G,{i/). 
From  Definition  5.7,  this  ball  will  have  the  minimum  time  to  cross  a  sector.  Denote  this 
niinimvni  time  to  cross  by  TTCi,^,^]  then, 

TTCi^Ji')  =  mmiTTCiia,!/))  (28) 
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for  all  pool  balls  in 


Definition  5.9:  The  estimator  of  uilu),  denoted  MST{{u)  is  defined  as 

MSTl{iy)  =  min  I/)  +  (n- l)  +  TrC, -i,., „,„(//))  (29) 

Vn  :  7?  51^  0,  n  -  i  >  0,  ft  +  i  <  P 

where  i  €  {0,1,2, . . . ,  (P  -  1)},  and  P  is  the  number  of  sectors  specified  by  the  user. 


Theorem  5.1:  The  estimator  MSTl{i/)  is  valid. 

Proof:  Any  estimator  of  u,(iy)  which  satisfies  Lemma  5.1  is  valid  and  simulation  correctness 
is  therefore  guaranteed.  Lemma  5.1  can  be  shown  to  be  satisfied  as  follows.  The  ne.\t  event 
P,  has  an  event  time  which  is  always  greater  than  zero  by  definition  of  monotonicity  of 
events.  The  maximum  X-axis  velocity  of  any  pool  balLin  G,{i/)  is  always  less  than. infinity; 
therefore,  the  minimum  time  to  cross  any  sector,  TTC, „„„(//),  is  always  greater  than  zero. 
Therefore,  MSTl{u)  >  0  for  all  i  in  {0,1,2...}. 

If  e,  ^  PART  then  L(e,-  5^  PART,u)  <  =  PART,u')  by  definition  of  monotonicity. 

From  Definition  5.8,  <  TPCV(e,- =  PAPTji/).  Therefore,  MST'l{u)  < 

Lemma  5.1  is  satisfied;  therefore,  Air5T/(//)  is  valid. 


Theorem  5.2:  The  estimator  MST^{r)  guarantees  progress;  that  is,  there  exists  a  sector 
whose  event  P,-  can  be  e.xecuted  for  all  i/. 

Proof:  Given  P  sectors  where  P  is  specified  by  the  user,  there  exists  a  sector  7,  i  G 
{0, 1,2 . .  .P  -  l),  whose  next  event  time  is  less  than  or  equal  to  the  next  event  time  for 
all  other  sectors.  From  Definition  5.9,  the  niiniimini  safe  time  estimator  is  the  minimum  of 
all  other  sector’s  next  event  times  plus  some  overhead.  Therefore,  sector  ?'s  iniiiimum  safe 
time  is  at  best  the  minimum  of  the  remaining  event  times,  but  sector  i's  event  time  is  less 
than  or  equal  to  all  of  the  others;  therefore,  sector  i's  next  event  time  must  be  less  than  or 
equal  to  its  minimum  safe  time  estimate.  From  Definition  5.5,  sector  i  can  e.xecute.  This 
is  true  for  all therefore,  progress  is  guaranteed. 
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5.3.1  Additional  Properties  of  M STi{o)  Two  interesting  properties  of  M ST^u) 
can  be  shown.  First,  no  two  adjacent  sectors  can  execute  an  event  simultaneously  if  their 
event  times  are  not  equal.  Second,  the  maximum  number  of  sectors  which  can  execute 
simultaneously  is  where  P  is  the  number  of  sectors  specified  by  the  user.  This  has 
implications  for  sector  assignments  which  will  be  discussed  later. 


Lemma  5.4:  Adjacent  sectors  cannot  both  meet  their  minimum  safe  times  using  M 
as  an  estimator  of  Wi(i/)  if  their  event  times  are  not  equal. 

Proof:  Let  two  adjacent  sectors  be  denoted  by  (?)  and  (?  +  'l).  If  ti(ei,u)  7^  i^) 

then  either /.i(e;,//)  >  /(,4.i)(e(,+i),//)  or /.(,+,)(e(,+i),//)  >  by  definition.  Both 

cases  can  be  shown  to  have  the  property  that  at  least  one  of  the  two  sectors  cannot  execute 
their  event. 

CASE  1: 

From  Theorem  5.1,  MST}{v)  is  the  minimum  of  all  other  sector’s  event  times  plus  some 
overhead.  For  adjacent  sectors,  the  value  of  n  in  equation  29  is  one;  therefore,  the  addi¬ 
tive  term  for  the  time  to  cross  is  zero.  Thus,  when  comparing  adjacent  sectors  only,  the 
minimum  safe  time  of  sector  i  is  the  minimum  of  its  two  neighbor’s  next  event  times.  There¬ 
fore,  MSTl{u)  is  at  most  equal  to  <(i+i)(e(,+i),//)  such  that  MSTfiv)  <  ;/■). 

is  greater  than  then  t,{e,,i/)  >  MST^{u)  and  execution  is  not 

allowed  by  Definition  5.5. 

CASE  2:  >  /((e,-,/'') 

The  argument  of  case  1  is  the  same  for  case  2  resulting  in  M5r,;^.j)(?/)  <  t,(e,,//)  and 
/(,+i)(e(,+|),?/)  >  therefore,  and  execution  is  not 

allowed  by  Definition  5.5.  Therefore,  for  both  ca.ses,  at  least  one  sector  of  two  adjacent 
sectors  cannot  execute  their  next  event  if  their  next  event  times  are  not  equal. 


Lemma  5.5:  If  each  sector  i  in  {0,l,2...P-l)  has  a  unique  next  event  time,  then  the 
maximum  number  of  sectors  that  can  execute  their  event  in  parallel  is  where  P  is  the 
number  of  sectors  specified  by  the  user. 
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Proof:  From  Lemma  5.4,  no  two  adjacent  sectors  can  execute  if  their  event  times  are 
unique.  Since  the  pool  table  is  partitioned  in  a  linear  array,  the  maximum  number  of 
non-adjacent  sectors  is  [ . 

5.3.2  Analysis  of  MSTl{u)  It  is  clear  from  Definition  5.9  that  every  sector  must 
have  knowledge  of  every  other  sector’s  candidate  next  event  and  minimum  time  to  cross. 
Global  communications  are  thus  required.  Several  techniques  are  available  to  accomplish 
this.  The  simplest  technique  is  to  have  each  sector  broadcast  its  next  event  time  and 
maximum  velocity  to  every  other  sector.  Given  M  nodes,  this  technique  requires  0{M') 
communications  time,  and  each  message  requires  more  than  two  orders  of  magnitude  of 
time  to  process  than  floating  division  (1).  To  reduce  the  number  of  communications, 
each  node  could  communicate  in  a  logarithmic  fashion  thereby  requiring  only  0(M  log  M) 
communications  time.  The  lower  bound  on  communications  h  0{j\l)  if  each  node  commu¬ 
nicates  only  with  its  immediate  neighbors  (in  terms  of  a  pool  table  sector  neighbor,  not 
a  hypercube  node  neighbor).  Each  of  the  techniques  requires  every  node  to  wait  for  the 
slowest  node.  After  each  sector  determines  its  candidate  next  event,  each  sector  must  cal¬ 
culate  its  MST.  To  do-this,  each  sector  must  know  every  other  sector’s  candidate  next  event 
time  and  every  other  sector’s  fastest  ball  velocity  (X-axis  only).  The  node  which  finishes 
calculating  its  candidate  next  event  first  must  necessarily  wait  to  calculate  its  MST  until 
the  last  node  calculates  its  candidate  next  event.  A  broadcast  communications  scheme 
cannot  eliminate  the  potential  wait  state.  Therefore,  the  0{M)  communcations  scheme  is 
the  optimum  implementation.  This  scheme  require.^  an  end  sector  to  communicate  with  its 
immediate  neighbor.  In  this  manner,  sector  0  sends  its  data  to  sector  1.  Sector  1  combines 
its  data  with  that  of  sector  0  and  sends  a  single  message  to  sector  2.  Finally,  sector  (P-l) 
receives  a  message  from  sector  (P-2)  thereby  giving  sector  (P-l)  all  of  the  data  from  every 
other  sector.  This  data  can  then  be  passed  back  to  each  .sector.  Parallelism  can  be  achieved 
by  recognizing  the  independance  of  sector  0  and  sector  (P-l).  As  sector  0  sends  its  data  to 
sector  1.  sector  (P-l)  can  .send  its  data  to  sector  (P-2)  in  parallel.  This  technique  performs 
in  lockstep  and  requires  every  sector  to  wait  on  the  two  end  sectors. 

5./,  Developing  an  Alternative  MST  Calculation 

Chandy  and  Misra’s  paradigm  requires  the  following  constraints: 

1.  LP{  sends  LPj  a  message  if  and  only  UP  Pi  has  an  edge  connecting  PPj  (6:443). 
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2.  There  exists  a  prespecified  constant  c  such  that 

4  —  4_i  >  €  for /;  =  (6:442). 

3.  The  minimum  safe  time  for  sector  i  is  the  minimum  of  the  message  tuples  {tk,mk) 
received  from  all  input  arcs  (Tik)  (6:444). 

The  MST  calculation  stated  in  Definition  5.9  is  slightly  different  from  that  proposed 
by  Chaiidy  and  Misra.  Specifically,  items  one  and  three  above  disallow  global  communi¬ 
cations.  Chandy  and  Misra  require  an  MST  based  only  upon  the  input  arcs  represented 
by  the  edges  connecting  each  PP,.  In  the  strictest  sence,  Definition  5.9  does  not  adhere 
to  a  true  Chandy- Misra  paradigm.  It  is  desirable  to  develop  an  estimator  that  conforms 
to  Chandy  and  Misra  because  their  paradigm  avoids  global  comnumications.  This  factor 
allows  their  paradigm  to  be  scalable  across  any  cube  size  where  as  MSTl{u)  is  not  scal¬ 
able.  As  the  number  of  nodes  increases,  the  communications  overhead  of  j\'/Sr,'(iz)  can  be 
expected  to  negate  any  gains  from  potential  parallelism. 

5.5  Developing  a  Second  Minimum  Safe  Time  Calculation 

This  section  develops  the  estimator  MST^{u)  which  will  be  shown  to  be  valid  but 
does  not  guarantee  progress.  The  estimator  is  derived  from  MSTi{u). 


Definition  5.10:  There  exists  an  upper  bound  on  the  velocity  for  any  pool  ball.  Due 
to  the  conservation  of  energy  and  momentum,  the  total  energy  of  all  of  the  pool  balls 
will  not  change  after  initialization.  Therefore,  there  is  a  maximum  velocity  that  any  pool 
ball  can  have  in  the  X-axis  direction  based  upon  the  initialization  values  and  there  is  a 
corresponding  global  minimum  lime  to  cross  any  sector.  The  absolute  minimum  time  to 
cross  a  sector  for  any  pool  ball  once  initialized  is  denoted  TTCgiobaUnin- 


Definition  5.11:  The  estimator  of  u;{i/},  denoted  MSTf{i/),  is  defined  as 

MSTf{u)  =  min(/i±i(e.-±i,//),  rrC3,o6a/.mm)  (30) 
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Theorem  5.3:  The  estimator  MST^{u)  is  valid. 

Proof:  Any  estimator  of  which  satisfies  Lemma  5.1  is  valid.  By  definition, 


MSTliv)  =  min I/)  +  (jr  -  l)*rrCi±„„, „(;/)) 
'in  :  11  0,  n  —  i>0,n  +  i<P 


=  mm 


^  ^i±ii  (^i±n  >  ~  1)  *  in  j 


'in:  n  ^  0,  n^  1,  n  —  i>0,  n  +  ?  <  P 


Also, 


0  +  («-l)*TrC,-±, ,„.„(//)  <  L-±«(e.-±,.,i^)  +  (n-l)*TrC.-i, (31) 

^iin  (^i±n ;  ^  0 


The  next  event  time  is  always  greater  than  zero  by  definition  of  monotonicity  of  events; 
therefore,  the  inequality  of  equation  31  is  true.  Furthermore, 


(7?.  -  1)  +  TTCgi„iai_„,i„  <  (ti  -  1)  *  rrCi±„„„ J;/) 
if  TTC global _,nih  < 


(32) 


The  global  minimum  time  to  cross  is  always  less  than  or  equal  to  TTC,±n„„„  by  Definition 
5.10;  therefore,  the  inequality  of  equa.tion  32  is  true.  Last, 

TTC global  _?iun  ^  (tI  —  1)  *  TTC  global. min 

These  substitutions  reduce  MSTl{i/)  to  MSTf(n)  and  MSTf{u)  <  MSTHv).  Since 
MS2'l(u)  has  been  proven  valid,  then  by  Lemma  5.1,  MSTf(u)  is  valid. 


Theorem  5.4:  The  estimator  MSTi{v)  does  not  guarantee  progress. 

Proof:  Given  that  MST;{v)  =  min  (/(i±i)(e(,±i),/>'),  TTC  global. mm)  from  Definition  5.11, 
suppose  that  TT'Cgiobai.min  <  for  all  i  in  {0,l,2...P-l}.  Then  MSTf{u)  = 

TTCgtobai.mm  fof  all  7  by  definition  of  MSTj{i/).  If  the  minimum  safe  time  is  le.ss  than 
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the  event  time  for  every  sector  i,  then  no  sector  can  execute  its  next  event  by  Definition 
5.5.  The  value  of  TTCgiohai.min  is  a  constant  by  definition  of  conservation  of  energy  and 
momentum;  therefore,  for  all  values  of MST^{u)  =  TTCgtobaUnin-  The  next  event  times 
for  all  sectors  will  remain  unchanged  because  each  value  of  i/  produces  the  same  results  if 
none  of  the  pool  balls  ever  change  state  information.  For  this  example,  the  simulation  will 
run  indefinitely  while  the  simulation  time  will  remain  at  zero! 

5.6  Developing  a  Third  MST  Calculation 

The  third  MST  formulation  relies  heavily  upon  Chandy  and  Misra's  concept  of  input 
arcs  and  output  arcs.  A  sector  /  outputs  a  message  to  a.  neighboring  sector  which  represents 
the  earliest  estimated  time  that  sector  /  can  send  a  transient  ball  message  to  its  neighbor. 
This  is  an  output  arc  from  sector  i  to  sector  (/  ±  1).  Equivalently,  each  sector  receives 
a  message  sent  from  its  neighbors  as  input  arcs.  The  formulation  for  the  third  MST 
estimator  which  adheres  to  Chandy  and  Misra’s  constraints  has  the  following  logic.  If  one 
iteration  ago  at  u  =  (//  -  1),  sector  (/  +  1)  passed  a  message  to  sector  /  indicating  that  no 
transient  ball  messages  will  be  sent  before  time  ij  then  for  the  next  iteration  u,  it  must 
still  be  true  that  sector  (/  +  1)  will  not  send  a  transient  ball  message  before  time  ii  due  to 
monotonicity  of  events.  Furthermore,  if  a  pool  ball  were  to  cross  sector  (/  + 1)  into  sector  i 
in  no  less  than  time  TTC global. mm ■,  and  one  iteration  ago  at  (i/  -  1)  sector  (/  +  1)  could  not 
output  a  transient  ball  message  until  at  least  time  ti,  then  for  iteration  sector  (/  +  l) 
cannot  output  a  transient  ball  message  to  sector  i  until  at  least  time  =  /]  ■’rTT C global. mm- 
This  concept  allows  the  MST  estimator  to  constantly  increase  in  size  until  at  least  one 
sector  can  execute  an  event.  After  executing  the  event,  there  is  no  guarantee  that  an 
event  can  be  executed  for  iteration  (//  +  1),  but  there  is  a  guarantee  that  execution  will 
be  possible  before  (;/  +  cc)  because  the  estimator  itself  constantly  increases  with  The 
potential  to  have  non-executing  iterations  reduces  the  efficiency  of  the  parallel  simulation 
and  the  lower  estimate  of  the  MST  reduces  the  probability  of  multiple  e.xeculing  sectors; 
however,  the  paradigm  is  scalable  to  M  nodes  where  M  is  limited  only  by  the  hardware. 
The  following  definitions  sui)port  the  development  of  the  Chandy  and  Misra estimator  and 
the  theorems  that  follow. 
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Definition  5.12:  Let  be  the  output  arc  of  sector  i  to  sector  (i  +  1)  at  some 

discrete  time  interval  u.  Then, 

+  TTCgiobaUnin)  (33) 

where  /,,(,±i)(t'-  1)  is  the  input  arc  to  sector  i  from  sector  (i±  1).  The  inputs  and  outputs 
are  related  by and  all  inputs  and  outputs  at  (z/  =  0)  =  0. 


Definition  5.13:  The  estimator  of  ?(,(z/),  denoted  MSTf{u),  is  defined  as 


MSTf{u)  =  min 


(34) 


Theorem  5.5:  The  estimator  MSTf{u)  is  valid. 

Proof:  To  prove  that  MSTfiy)  is  valid,  it  will  be  shown  that  for  all  u,  MSTf{v)  < 
MSTi{u).  This  will  be  done  by  analyzing  the  specific  instances  for  u  =  \  and  2  from 
which  a  general  solution  for  all  u  clearly  presents  itself.  The  general  solution  i.$  equivalent 
to  the  combination  of  Definitions  5.12  and  5.13. 


CASE  1:  ;/  =  1 

Through  substitution,  MSTf[u  =  1)  can  be  equivalently  written  as  min  (0(,±ij_,(l)).  Sub¬ 
stituting  the  output  terms  with  Definition  5.12  results  in  the  following  equation: 


jV/5T?(l) 


/ 

min 

V 

/ 

min 

\ 


inin(t,'.{.] (C;.].] ,  l),  TTCg\i)iai_min)', 
min{tj^l  (Cf— 1 , 1  )-  j 

L’±l(t'i±lj  1)  +  global 

^■±2(<^i±2j  0)  +  I  *TTCglol,„l  _miri  J 


where  iAci.i/  =  0)  =  0  for  all  i. 
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CASE  2:  1/  =  2 

Using  the  same  substitution  steps,  tlie  following  is  seen  to  be  true  for  u  =  2. 


GENERAL  SOLUTION: 

From  the  preceding  two  cases,  the  general  solution  for  all  ;/  is: 

MSTf{u)  =  min  (/,±„(e;±„, (//  -  7i  +  1))  +  (?».  -  1)  +  TTCgi„iaUmn)  (3o) 


Equation  35  has  a  striking  resemblance  to  the  definition  of  however,  it  is 

easily  shown  that  iV/5!r?(;/)  <  iV/.5T/(//)  as  follows.  For  all /,  TTCj/oini-min  < 
from  Definition  5.10.  Due  to  monotonicity  of  events.  /,±n(e,skri,  (»■'-«•+ 1))  <  ''') 

since  {u  —  n  +  l)  <  u  for  (71  >  1).  Therefore,  MST?{v)  <  MST/(t')  which  implies  that 
MSTf{v)  is  valid. 
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Theorem  5.6:  The  estimator  of  u, ■(«/),  MSTf{v),  guarantees  progress. 

Proof:  After  substituting  the  definition  of  output  arcs  into  the  definition  of  MST?{u),  tlie 
estimator  can  be  written  as 

MSTffv)  =  min  (0(, -+1),, ■(//), 


.  /  min  [i,+i(ef+i,I/),0(;+2),(i+l)(^') +TrC’j/(,ta(_min] 

=  mm 

^  min  +  TTCgtotal^min] 

This  equation  clearly  shows  that  the  value  of  MSTf[u)  constantly  increases  because  either 
the  ne.\t  event  time  t,±i(c,±i, /-')  increases  by  definition  of  monotonicity  of  events,  or  the 
term  representing  the  output  arc  of  previous  iterations  increases  by  a  constant  factor 
TTCgioiaUmiif  Therefore,  execution  of  events  is  possible  as  //  increases. 

5. 7  Analyzing  Alternative  MST  Calculations 

MST^{u)  requires  global  communications  whereas  MSTf[v)  does  not.  The  commu¬ 
nications  lime  for  MST}{u)  can  be  reduced  to  at  most  0{M)  where  M  is  the  number 
of  nodes.  The  communications  time  for  MSTf{v)  is  0(1).  Thus,  scalability  is  a  major 
tradeoff.  A/ .5’T/(i/)  specifically  requires  {M  —  1)  communications  per  node  while  MST?(n) 
requires  a  constant  two  communications  per  node  for  all  I)ut  the  end  sectors  which  require 
only  one.  Thus,  for  Al  =  2,  the  two  MST  calculations  require  approximately  the  same 
communications  time.  For  AI  =  4,  MSTf[y)  is  perhaps  slightly  superior  if  all  else  is  the 
same.  Clearly,  all  else  is  not  the  same  because  AIST^iu)  guarantees  to  e,xecute  at  least 
one  event  per  iteration  whereas  MSTf{v)  does  not.  This  factor  constitutes  the  tradeoff 
between  execution  rate  and  communications  time  complexity.  Not  only  can  AISTi{n) 
guarantee  to  execute  at  least  one  event  per  iteration,  but  up  to  sectors  can  execute 
per  iteration.  A  simple  four  sector  example  highlights  the  differences  between  the  two 
paradigms. 


Conjecture  5.1  For  small  M  v/here  Mis  the  number  of  nodes  in  the  pool  balls  simulation, 
i\fSTf{u)  is  superior  to  MST?{if).  The  size  of  AI  has  been  shown  to  be  at  least  4. 

5.7. J  An  Example  using  Both  MST’s  To  demonstrate  the  implementation  of  both 
conservative  paradigms,  consider  the  four  node,  four  sector  pool  balls  simulation  diagram 
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of  Figure  12.  The  terms  ‘NET’  and  ‘TTC’  represent  the  next  event  time  and  time  to  cross 
a  sector  for  the  fastest  ball  in  sector  i.  The  simulation  time  for  each  sector  is  currently 
0.00  seconds. 


Figure  12.  A  Four  Node,  Four  Sector  Process  Graph 


Using  the  MSTKt.^)  with  Global  Communications 

Each  message  sent  from  sector  i  to  sector  fchas  the  form  {N ETi.TTCi,  N U LL). 

Node  0  sends  (3.0,  2.0,  NULL)  to  Node  <  1,2,3  > 

Node  1  sends  (2.0,  2.5,  NULL)  to  Node  <  0,2,3  > 

Node  2  .sends  (d.O,  1.5,  NUJjL)  to  Node  <  0,1,3  > 

Node  3  sends  (2.5,  3.0,  NU LL)  to  Node  <  0, 1,2  > 


MSTo  =  min  {(2.0  + 0* 2.5),  (d.0+ 1  +  1.5),  (2.5  + 2 *3.0)}  =  2.0 

MSTi  =  min  {(3.0  +  0  +  2.0),  (-1.0  +  0  +  1.5),  (2.5  +  1  *  3.0)}  =  3.0 

MST2  =  min  {(3.0  +  1 +2.0),  (2.0  + 0*2.5),  (2.5+ 0*3.0)}  =  2.0 

MSTs  =  min  {(3.0  +  2  *  2.0),  (2.0  +  1  *  2.5),  (d.O  +  0.1.5)}  =  d.O 


P{Eo).F(E,),P{E2),  P(E2) 
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Using  the  MSTf{u)  with  Constant  Communications 

For  MSTf{u),  the  global  minimum  TTC  must  be  known  a  priori.  For  the  scenario  of 
Figure  12,  the  minimum  TTC  is  1.5  for  all  sectors. 

LOOP  1 

Oo,i(l)  =  min(3.0, 1.5)  =  1.5 
0i,2(l)  =  min(2.0,  1.5)  =  1.5 
02,3(1)  =  mln(4.0,  1.5)  =  1.5 
03,2(1)  =  min(2.5,  1.5)  =  1.5 
02.1(1)  =  min(4.0,  1.5)  =  1.5 
Oi,o(l)  =  min(2.0, 1.5)  =  1.5 


MSTo{l)  =  min(1.5)  =  1.5 
MSTi{l)  =  min(1.5,  1.5)  =  1.5 
MST2il)  =  min(1.5,  1.5)  =  1.5 
MST3{1)  =  min(1.5,  1.5)  =  1.5 


P{Eo),  PiEo),  P{E2) 

At  this  point,  the  first  high  level  loop  construct  has  finished.  Each  node  calculated 
a  candidate  ne,xt  event,  determined  its  MST  and  executed  il.s  next  event  for  all  sectors 
provided  NET  <  MST.  This  example  shows  that  none  of  the  sectors  could  safely 
c.\ecute  an  event  based  upon  the  information  provided.  The  estimator  MST,^{u)  executed 
two  events  on  two  nodes  thereby  achieving  50%  parallelism  during  the  execution  phase. 
This  was  possible  because  each  node  had  additional  information  upon  which  to  calculate 
a  superior  MST  <it  the  cost  of  increased  communications.  The  estimator  MST?[i')  must 
implement  a  second  high  level  loop  to  attain  the  .same  simulation  state  shown  as  follows. 

LOOP  2 

Oo,i(2)  =  min(3.0,  (1.5+  1.5))  =  -3.0 
0,.2(2)=  min(2.0,  (I.5  +  I..5))  =  2.0 
02,3(2)  =  min(4.0,  (1.5  +  1.5))  =  -3.0 
03,2(2)  =  min(2.5.  (1.5+  1.5))  =  2.5 
02,1(2)  =  min(4.0,  (1.5 +  1.5))  =  -3.0 
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<^i,o(2)  =  niin(2.0.  (1.5  +  1.5))  =  2.0 


MSTo{2)  =  min(2.0)  =  2.0 

MSTi{2)  =  min(3.0,  .3.0)  =  3.0 

MST2{2)  =  inin(2.0,  2.5)  =  2.0 

MST3{2)  =  min(3.0)  =  3.0 


/.  PiEil  PiEu):  P(E3) 

The  scenario  above  illustrates  the  typical  ‘wind  up'  overhead  of  conservative  sim¬ 
ulations  implemented  with  Chandy-Misra.  The  number  of  loops  required  to  reach  an 
executable  slate  <lepcnds  upon  the  difference  between  TTC,„,n  and  NETmm-  The  scenario 
presented  incorporated  a  A  /  small  enough  that  the  wind  up  cost  consisted  of  only  one 
loop;  however,  this  will  not  always  be  the  case.  Even  after  the  windup  is  finished,  the 
proof  presented  earlier  validates  the  po.ssibilily  that  V/; :  MSTk  <  tk- 

5.$  Selecting  an  MST  Formulation 

Both  equations  for  calculating  the  minimum  safe  time  were  considered  for  this  thesis 
effort.  The  estimator  MST'{n)  seemed  to  be  more  intrinsically  progr.ammable  and  ‘\FIT 
is  currently  limited  to  an  eight  node  hypercube  which  favors  MST^^n)  due  to  small  cube 
size.  No  attempt  was  made  to  implement  both  strategies  so  empirical  data  is  not  available 
to  date.  The  following  sections  describe  the  implementation  strategy  used  to  incorporate 
MSTiHu). 


5.S.J  Implementing  the  Minimum  Safe  Time  Implementing  Definition  3.2isstraight 
forward.  Two  message  communication.s  daiastrHcturc.s  were  defined.  Both  data  structures 
have  the  following  fields: 

1.  Sector  Number 

2.  Candidate  .Ne.xt  Event  Time 

3.  Time  to  Cross  (for  the  fastest  ball  in  Sector  i) 

Time  to  be  afTeclcd  by  any  left  sector. 
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5.  Time  to  bo  afTectecl  by  any  right  sector. 

6.  MSTi 

Sector  zero  completes  fields  one  through  four.  Field  four  is  infinity  (defined  in  a  header 
file  to  bo  99999999.99  seconds)  since  sector  0  has  no  left  neighbor.  This  structure  is  ne.Kt 
passed  to  sector  1.  Sector  1  fills  in  fields  one  through  four.  This  structure  is  then  passed 
to  sector  2.  This  series  continues  until  sector  (P  -  1)  receives  the  data  structure.  At  this 
point,  every  sector  (0  -  K)  knows  the  earliest  that  it  can  receive  a  transient  ball  message 
from  any  sector  to  their  left.  As  this  entire  process  takes  place,  sector  {P-1)  sends  a  data 
structure  to  its  neighbor  {P-2)  in  parallel.  Sector  (P-J)  assigns  infinity  to  the  time  to  be 
affected  by  any  right  sector  as  it  has  no  right  neighbor.  Assu»  ing  near  equal  processing 
time,  sector  {P-J)  will  receive  its  message  originated  by  sei^tor  0  at  the  same  time  that 
sector  0  receives  its  message  originated  by  sector  {P-1).  At  this  point,  every  sector  now 
knows  the  earliest  time  at  which  they  can  receive  a  transient  ball  message  from  either  the 
left  or  the  right.  The  MSl'is  simply  the  minimum  of  these  two  values. 

5.S.2  Implementing  WielamPs  Data  Replication  Strategy  The  data  replication  strat¬ 
egy  remains  basically  unchanged  from  the  partitioned  sequential  version.  Pool  bail  objects 
must  still  transition  from  one  sector  to.an  adjacent  sector  in  a  two  step  process  via  rARTI- 
TION  events  and  EXIT  events.  If  adjacent  sectors  are  collocated  on  one  node,  data  repli- 
'  tion  may  take  place  directly  through  memory  as  described  in  Section  4.3.4.  If  sector,-  ’s 
adjacent  sector  resides  on  a  different  node,  the  rules  for  data  replication  presented  in  Sec¬ 
tion  4.3.4  must  be  enforced  through  discrete  message  passing.  It  is  imperative  that  each 
sector  have  its  data  replication  updated  prior  to  determining  its  next  event;  otherwise, 
incorrect  events  can  occur.  For  example,  sector,  could  have  a  replicated  ball  object  stored 
in  its  ball  object  manager.  This  replicated  ball  could  be  scheduled  for  a  collision  with  an¬ 
other  of  sector,  ’s  ball  objects.  If  the  replicated  ball  was  previously  moved  by  the  owning 
sector  (i.e.  sector, oi  sec/m',^.]),  and  if  sector,  had  not  updated  its  replicated  copy,  the 
predicted  collision  event  would  be  in  error.  In  fact,  the  case  could  arise  that  the  replicated 
ball  should  not  even  be  visible  to  sector,  had  the  update  been  enforced.  This  condition 
requires  that  every  sector  wait  to  determine  candidate  events  until  data  replication  has 
been  completed. 

To  implement  the  synchronous  waiting  condition,  every  node  sends  a  message  counter 
to  the  adjacent  node(s)  stating  the  number  of  data  replications  that  will  occur.  If  a  node 
has  no  ball  objects  to  send,  the  message  counter  sent  equals  zero.  In  this  fashion,  each 
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node  can  simultaneously  execute  scheduled  events  immediately  followed  by  sending  data 
replication  message  counters  to  adjacent  nodes.  At  this  point,  lockstep  synchronization  is 
again  enforced  as  no  node  may  continue  processing  until  it  has  received  message  counters 
from  all  adjacent  nodes.  Once  a  node  receives  a  message  counter  from  an  adjacent  node, 
that  node  is  able  to  receive  the  proper  number  of  replicated  object  commands. 

5.9  Svmmary 

The  design  of  the  pool  balls  simulation  is  object  oriented.  The  equations  of  mo¬ 
tion  conform  to  the  laws  of  physics  for  elastic  collisions  (i.e.  conservation  of  energy  and 
momentum)  and  frictionless  motion.  The  parallel  design  of  the  pool  balls  simulation  incor¬ 
porated  Schr’iben’s  concept  of  resident  entities  thereby  modeling  the  table  as  a  distributed 
set  of  table  sectors.  This  has  shown  to  reduce  the  amount  of  communications  over  a 
transient  entity  design  ajiproach.  The  paradigm  developed  for  the  synchronization  of  the 
distributed  simulation  has  been  shown  to  be  conservative.  This  conservative  paradigm 
is  superior  to  that  proposed  by  Chandy  and  Misra  for  small  N.  Both  this  paradigm  and 
Chandy-Misra's  paradigm  avoids  the  possibility  of  deadlock  via  NULL  message  passing. 
With  both  paradigms,  improved  parallelism  can  .be  achieved  by  assigning  multiple  sectors 
(LPs)  to  individual  nodes.  The  upper  bound  on  the  number  of  sectors  that  can  safely 
execute  a  candidate  event  using  the  paradigm  developed  in  this  thesis  is  therefore, 
100%  parallelism  is  possible  if  and  only  if  each  node  has  assigned  to  it  two  or  more  table 
sectors. 
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VI.  Test  Results 


6. 1  Introduction 

Chapter  VI  outlines  the  measures  taken  in  this  thesis  effort  to  validate  the  pool 
balls  software  and  the  test  procedures  used  to  generate  performance  data.  The  results  of 
the  tests  are  discussed  from  which  conclusions  are  made.  The  conclusions  are  stated  in 
Chapter  VII. 

6.2  Verification  and  Validation 

The  pool  balls  simulation  was  designed  and  implemented  in  three  major  steps  con¬ 
sisting  of  the  following: 

1.  Design  and  implementation  of  a  sequential  simulation  without  spatial  partitioning 
and  data  replication. 

2.  Design  and  implementation  of  a  sequential  simulation  incorporating  s])atial  parti¬ 
tioning  and  data  replication. 

3.  Design  and  implementation  of  a  parallel  simulation  incorporating  spatial  partitioning 
and  data  replication. 

The  first  sequential  simulation  was  validated  in  several  stages  consisting  of  the  fol¬ 
lowing  tests: 

1.  Test  a  collision  between  a  pool  ball  and  a  cushion  (both  horizontal  and  vortical). 

(a)  Create  a  scenario  with  known  behavior.  Force  the  simulation  to  produce  the 
specified  pool  balls  (i.e.  positions,  times  and  velocities)  and  compare  the  simu¬ 
lation  results  with  expected  results. 

(b)  Enable  the  simulation’s  random  number  generator  to  produce  a  random  pool 
ball  and  collisions.  Record  the  events  to  disk  and  verify  output  by  calculating 
each  event  by  hand. 

2.  Test  a  collision  between  two  pool  balls. 

(a)  Create  a  scenario  with  known  behavior.  Force  the  simulation  to  produce  the 
specified  pool  balls  (i.e.  positions,  times  and  velocities)  and  compare  the  simu¬ 
lation  results  with  expected  results. 
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(b)  Enable  the  simulation’s  random  number  generator  to  produce  random  pool  balls 
and  collisions.  Record  the  events  to  disk  and  verify  the  output  by  calculating 
each  event  by  hand. 

Test  la  consisted  of  creating  a  single  pool  ball  scenario  on  paper  with  known  time, 
position  and  velocity.  The  random  number  generator  was  disabled  to  allow  the  creation 
of  a  pre-specified  pool  ball.  Four  separate  tests  were  run  to  verify  correct  operation  of  a 
left  and  right  vertical  cushion  collision  and  a  top  and  bottom  horizontal  cushion  collision. 
The  simulation  output  was  compared  against  the  hand  calculated  results. 

Test  lb  consisted  of  creating  a  randomly  generated  pool  ball  and  simulating  25 
events.  With  only  one  pool  ball,  all  25  events  were  guaranteed  to  be  limited  to  horizontal 
and  vertical  cushion  events.  The  simulation  output  was  checked  by  hand  for  all  25  events. 
This  test  wiis  performed  three  times  to  produce  a  high  level  of  confidence. 

Test  2a  consisted  of  creating  various  scenarios  involving  two  or  more  pool  balls  pre¬ 
positioned  to  intentionally  produce  pool  ball  collisions  at  known  times.  The  random  num¬ 
ber  generator  was  disabled  to  allow  creation  of  deterministic  inputs.  The  simulation-output 
was  verified  by  comparing  each  collision  (includitig  the  cushion  collisions)  with  the  expected 
hand  calculated  results. 

Test  2b  consisted  of  creating  randomly  generated  pool  balls  and  simulating  10  events. 
Each  event  was  verified  by  hand.  This  test  was  performed  five  times  with  varying  random 
inputs  and  number  of  pool  balls. 

Large  quantities  of  pool  balls  as  well  as  large  quantities  of  events  were  not  possible 
to  lest  due  to  the  labor  intensive  calculations  required  for  comparison.  While  these  limited 
tests  do  not  prove  system  correctness,  the  test  results  produce  a  high  level  of  confidence 
in  system  correctness. 

The  sequential  simulation  incorporating  spatial  partitioning  and  data,  replication 
was  validated  by  comparing  the  output  against  the  output  of  the  first  simulation  under 
the  following  constraints: 

1.  The  number  of  pool  balls  and  initial  conditions  for  both  simulations  were  equal. 

2.  Partition  and  Exit  events  were  not  recorded  to  disk. 

.3.  The  user  specified  simulation  time  was  the  same  for  both  simulations. 
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Since  th'e  pseudo  random  number  generator  uses  a  seed  without  reference  to  a  system 
clock,  each  simulation  test  run  produces  the  exact  same  initial  conditions  provided  the 
seed  remains  unchanged.  In  this  manner,  the  number  of  pool  balls  could  be  specified  for 
both  simulation  software  versions  resulting  in  identical  initial  conditions.  By  not  recording 
PARTITION  and  EXIT  events  to  disk,  the  test  outputs  from  both  simulation  versions 
should  have  been  the  same.  This  was  verified  by  using  the  Unix  ‘diff’  command  on  the  two 
output  files.  To  gain  further  confidence  in  valid  system  operation,  sector  crossings  were 
verified  by  hand  for  25  different  pool  balls  corresponding  to  25  different  discrete  points  in 
simulation  time. 

The  parallel  simulation  version  was  tested  against  the  original  sequential  simulation 
and  against  the  partitioned  sequential  version  in  that  order.  The  first  series  of  tests  were 
identical  to  those  discussed  above.  The  Unix  ‘diff’  command  was  used  to  highlight  any 
differences  in  simulation  outputs  between  the  sequential,  non-partitioned  version  and  the 
parallel  version.  A  lengthy  test  consisting  of  100  pool  balls  and  a  simulation  time  of  60 
seconds  was  used  as  a  final  test.  During  the  second  series  of  tests,  all  PARTITION  and 
EXIT  events  were  included  in  the  simulation  output.  The  sequential  and  parallel  software 
versions  were  compared  against  each  other  to  test  the  functionality  of  the  border  crossings. 
Again,  100  pool  balls  were  simulated  for  60  seconds.  The  Unix  ‘diff’  command  did  not 
produce'any  differences  between  the  two  partitionable  software  versions. 

6.3  Sinnilation  Perfomumce  Test  Plan 

Several  parameters  were  available  to  vary.  It  was  desirable  to  gain  insight  into  the 
performance  of  the  implemented  design  as  the  parameters  change.  Scalability  in  terms  of 
cube  size  and  load  factor  performance  are  two  qualities  of  particular  interest.  The  following 
section  defines  the  variables  of  interest  which  were  scrutinized  during  the  test  phase. 

6.3.  J  Defining  the  Variables  of  Interest  The  number  of  nodes  is  a  variable  of  in¬ 
terest  without  which  speedup  calculations  aie  impossible.  Therefore,  all  test  cases  defined 
must  be  duplicated  foi  various  node  configurations.  The  software  design  imposed  the  con¬ 
straint  that  the  number  of  nodes  selected  must  be  a  power  of  two.  AFIT  has  an  eight  node 
hypercube;  therefore,  four  test  runs  must  be  made  for  any  given  test  case  corresponding 
to  one,  two,  four  and  eight  nodes. 

The  number  of  pool  balls  is  a  variable  of  interest.  Changing  this  variable  allows 
inspection  and  analysis  of  the  relationship  between  speedup  and  computational  loading. 
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The  algorithm  design  and  analysis  phase  predicted  that  the  speedup  should  generally 
improve  with  increased  loading  due  to  the  overall  0{N~)  algorithm.  Furthermore,  as  the 
loading  increases,  there  is  a  greater  probability  that  each  sector  will  contain  one  or  more 
pool  balls  and  therefore  produce  useful  candidate  events  which  may  be  executedJn  parallel 
with  other  useful  events. 

The  number  of  sectors  is  a  variable  of  interest.  The  algorithm  design  and  analysis 
phase  predicted  that  the  run  time  performance  can  improve  with  increased  sectoring.  If 
this  is  in  fact  true,  then  sectoring  becomes  crucial  in  the  calculation  of  speedup.  Since 
speedup  relates  the  parallel  run  time  to  the  best  sequential  time,  the  optimum  sectoring  on 
a  single  node  must  be  determined.  There  was  no  technique  available  in  the  analysis  phase 
to  determine  a  priori  the  optimum  sectoring  for  a  given  quantity  of  pool  balls;  therefore, 
sectoring  must  he  a  variable  if  the  optimum  sectoring  is  to  be  found. 

6.3.2  Defining  the  Constants  As  variables  of  interest  are  changed  from  test  to  test, 
the  simulation  run  time  and  pool  table  dimensions  must  remain  constant  to  produce  any 
meaningful  results.  The  simulation  run  time  was  set  to  2.00  seconds.  This  time  was  selected 
based  upon  the  results  of  some  trial  experiments.  Using  500  pool  balls  on  a  single  node, 
the  test  run  required  approximately  four  hours  of  wall  clock  time.  Using  10  pool  balls,  the 
test  run  required  approximately  one  minute.  This  range  seemed  reasonable  ba.sed  upon 
the  time  constraints  of  this  thesis  effort. 

The  table  dimensions  were  set  to  102d  x  512  inches.  The  width  of  512  was  arbitrary. 
The  length  of  1024  was  selected  to  provide  a  reasonable  degree  of  sectoring  capability. 
Given  a  one  inch  pool  ball  radius  (arbitrarily  selected),  a  length  of  1024  inches  allows  up 
to  256  sectors  of  equal  size  such  that  no  two  sectors  overlap  and  a  pool  ball  can  reside  in 
a.  sector  without  overhanging  into  a  border  region  between  sectors. 

6.3.3  The  Test  Plan  The  quantity  of  pool  balls  was  tested  at  values  of  10,  20,  .30, 
40,  50,  100  and  200.  The  range  of  sectors  to  implement  was  1  to  68  based  upon  some 
sample  test  runs.  The  two,  four  and  eight  node  tests  varied  the  number  of  sectors  by  an 
even  multiple  of  the  number  of  nodes.  The  single  node  tests  could  have  varied  the  number 
of  sectors  between  1  and  68  in  multiples  of  one;  however,  multiples  of  two  were  arbitrarily 
selected  to  save  time.  After  the  initial  series  of  tests  were  finished,  additional  tests  were 
added.  The  performance  curves  were  later  extended  by  performing  test  runs  at  300  and 
400  j.  ■  balls.  The  quantities  120  and  160  were  added  to  be  able  to  compare  test  results 
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with  those  of  Cal  Tech.  The  results  of  the  first  seven  tests  are  shown  as  figures  13  through 
19. 


6.4  Test  Results 

6.4.1  Analysis  of  Pool  Table  Sectoring  Figures  13  through  19  graphically  illustrate 
the  effect  of  sectoring  the  pool  table  on  one,  two,  four  and  eight  nodes  for  various  levels 
of  computational  loading.  The  execution  time  of  figures  13  and  14  does  decrease  with 
increased  sectoring.  This  is  due  to  the  fact  that  the  gains  from  adding  additional  sectors 
are  outweighed  by  the  increase  in  intermediate  events.  Recall  that  the  only  real  events  of 
interest  consist  of  collisions  between  pool  balls  and  table  cushions  and  collisions  between 
different  pool  balls.  A  pool  ball  can  traverse  the  entire  table  in  one  step  using  a  single 
partition  provided  there  are  no  pool  ball  collisions.  Each  additional  sector  adds  two  events 
(PARTITION  and  EXIT)  for  a  pool  ball  to  traverse  the  table.  Hence,  increasing  the 
number  of  sectors  increases  the  total  number  of  events  to  process.  Although  the  number 
of  events  to  process  increases,  the  time  to  determine  the  next  event  decreases  with  P, 
where  P  is  the  number  of  table  sectors.  As  the  number  of  sectors  increases,  the  average 
number  of  pool  balls  in  any  sector  decreases.  If  the  average  density  decreases  below  one 
pool  ball  per  sector,  then  there  will  be  at  least  one  sector  which  has  no  pool  balls  in 
it.  .Adding  more  sectors  beyond  this  limit  will  therefore  not  decrease  the  search  space; 
however,  the  total  number  of  events  to  process  will  continue  to  increase.  The  tradeoff 
between  decreasing  the  search  space  and  increasing  the  total  number  of  events  determines 
the  optimum  number  of  sectors.  When  using  multiple  nodes,  the  tradeoff  is  less  intuitive 
because  of  the  effects  described  in  Theorem  5.2.  This  theorem  states  that  no  tw'o  adjacent 
sectors  can  both  meet  their  minimum  safe  times  provided  the  next  event  times  are  not 
equal.  .Adding  more  sectors  on  multiple  nodes  therefore  increases  the  probability  that  a 
node  has  at  least  one  executable  process  from  Theorem  5.3.  The  upper  limit  for  increasing 
performance  by  adding  additional  sectors  is  still  a.  density  of  one  pool  ball  per  sector  since 
adding  additional  sectors  l)cyond  this  point  will  not  give  a.  node  a  greater  probability  of 
executing  a  useful  proce.ss. 

Figure  18  has  more  of  a  parabolic  shape.  Notice  that  as  the  number  of  pool  balls 
(loading  factor)  increases,  sectoring  the  pool  table  increases  in  importance.  Fig  19  ap¬ 
pears  to  be  more  of  a  relationship;  however,  it  is  conjectured  that  the  family  of  curves 
eventually  rises  with  increased  sectoring. 


Simultaion  Run  Time  (msecs)  Simulation  Run  Time  (msecs) 


Figure  13.  Performance  Curves  for  10  Balls 


Figure  14.  Performance  Curves  for  20  Balls 
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Simulation  Run  Time  (msecs)  Simulation  Run  Time  (msecs) 


Figure  15.  Performance  Curves  for  30  Balls 


Figure  16.  Performance  Curves  for  40  Balls 


Simulation  Run  Time  (msecs)  Simulation  Run  Time  (msecs) 


Figure  17.  Performance  Curves  for  50  Balls 


Figure  18.  Performance  Curves  for  100  Balls 
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Figure  19.  Performance  Carves  for  200  Balls 


The  curves  of  figure  20  indicate  that  the  speedup  for  any  cube  size  approaches  an 
assyiiiptolic  limit  regardless  of  load  factor.  This  was  surprising  because  it  was  e.xpected 
that  the  speedup  would  generally  improve  with  N.  Since  the  algorithm  time  comple.\ity 
is  C>{:V').  increasing  the  number  of  pool  balls  should  increase  the  computations  to  com¬ 
munications  ratio  thereby  favoring  the  course  granularity  of  the  iPSC/2  hypercube.  It  is 
conjectured  that  this  is  not  the  case  because  the  optimal  number  of  sectors  increases  with 
jV;  therefore,  the  search  space  only  increases  with  0{^).  Furthermore,  as  the  optimum 
number  of  sectors  increase,  the  percentage  of  parallelism  increases  due  to  the  effects  of 
Lemma  o.d  and  S.o.  The  curves  of  figure  21  show  the  same  data  in  the  more  traditional 
format. 

The  efficiency  (^)  is  shown  in  figures  22  and  2.3.  The  curves  of  figure  22  show 
the  relationship  between  efficiency  and  load  factor.  The  efficiency  appears  to  approach  a 
limit  for  each  cube  size  regardless  of  the  number  of  objects  to  simulate.  Botli  figures  22 
and  23  clearly  show  that  the  efficiency  decreases  with  increasing  cube  size.  This  is  not 
surprising  because  the  analysis  phtise  of  Chapter  V  stated  that  the  estimator  MSTi{t/) 
is  not  scalable  due  to  the  global  communications.  It  is  reasonable  to  assume  that  the 
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efficiency  will  continue  to  decrease  beyond  a  cube  size  of  three  although  this  cannot  be 
tested  with  AFIT’s  eight  node  hypercube. 
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Figure  20.  Speedup  Curves  as  a  Funtion  of  Load  Factor 


Figure  21.  Speedup  Curves  as  a  Function  of  Cube  Size 
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Figure  22.  Efficiency  Curves  as  a  Function  of  Load  Factor 


Figure  2.3.  Efficiency  Curves  as  a  Function  of  Cube  Size 
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Analysis  of  the  data  for  optimum  sectoring  reveals  an  interesting  trend.  After  50 
pool  balls,  the  trend  appears  .to  establish  a  relationship  between  the  optimum  sectoring 
for  ail  nodes  and  the  number  of  pool  balls  per  sector  (density).  The  curves  are  shown 
as  Fig  24.  In  fact,  these  curves  were  used  to  tailor  the  test  plan  for  the  300  and  400 
pool  ball  trials  in  an  effort  to  reduce  the  time  to  complete  the  test  runs.  It  was  not 
surprising  that  the  optimum  sectoring  is  related  to  the  pool  ball  density  due  to  the  tradeoffs 
discussed  earlier  in  Chapter  \T;  however,  another  possibility  exists.  As  the  number  of  pool 
balls  increases,  the  total  number  of  events  to  process  also  increases.  These  events  can  be 
divided  in  two  categories:  events  which  are  internal  to  a  node  and  events  which  require 
communications.  These  correspond  to  the  subsets  {VERTICAL,  IIORIZONT.4L,  COLb} 
and  (PARTITION,  EXIT}  re.spectively.  Currenth,  the  software  keeps  track  of  events  by 
type  (i.e.  VERTICAL,  HORIZONTAL,  etc);  however,  with  multiple  sectors  per  node,  not 
every  PARTITION  and  EXIT  event  requires  communications  via  message  passing.  Should 
a  relationship  e.xist  between  optimum  sectoring  and  the  ratio  of  internal  event  processing 
to  external  event  processing,  j  then  a  conclusion  can  be  made  regarding  all  simulations 
incorporating  a  two  dimensional  spatial  partitioning  scheme.  Analysis  of  this  relationship 
has  not  yet  been  accomplished. 
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Figure  24.  Density  Curves  as  a  Function  of  Load  Factor 
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6.5  Comparison  of  A  FIT  and  Cal  Tech  Simidation  Results 

Cal  Tech  developed  the  original  pool  balls  simulation  concept.  Their  research  re¬ 
volved  around  the  Time  Warp  optimistic  synchronization  paradigm.  The  simulati-ms  de¬ 
veloped  by  AFIT  and  Cal  Tech  are  very  similar,  but  not  identical;  therefore,  comparisons 
are  limited  to  general  trends.  The  software  designs  for  the  two  systems  are  different  in 
several  respects. 

1.  Each  pool  ball  in  AFlT’s  program  has  identical  radius  while  Cal  Tech’s  program 
allows  for  varying  radii. 

2.  Each  pool  ball  in  AFIT’s  program  has  identical  mass  while  Cal  Tech’s  program  allows 
for  varying  mass. 

3.  The  AFIT  software  design  establishes  the  entire  table  as  a  single  object.  Cal  Tech’s 
software  design  divides  the  four  table  cushions  into  sections,  each  of  which  is  an 
object. 

‘1.  Cal  Tech  describes  each  pool  ball  (or  puck)  to  be  a  separate  object  whereas  the  AFIT 
design  defines  a  class  of  pool  balls  and  a  single  ball  manager  object  (3). 

5.  Cal  Tech  chose  the  more  efficient  O(A’)  algorithm  to  manage  a  next  event  queue 
containing  events.  The  AFIT  simulation  algorithm  is  0{N-)  and  avoids  complex 
event  list  manipulation  by  storing  on  average  only  one  event  at  a  time. 

The  first  four  differences  enumerated  above  should  not  affect  the  test  results  for  either 
system  in  any  appreciable  manner.  These  differences  represent  implementation  decisions. 
The  implementation  differences  will  probablj  result  in  different  execution  times  but  relative 
speedup  measurements  enable  valid  comparison  exercises.  Item  five,  however,  represents  a 
significant  design  difference.  As  the  Intel  hypercube  is  course  grain,  the  Of  A'-)  algorithm 
provides  a  better  match  between  software  and  hardware.  Thus,  one  would  expect  that 
an  O(N^)  simulation  would  result  in  superior  speedup  over  an  0{N)  algorithm,  all  el.se 
remaining  the  same.  This  highlights  another  potentially  significam.  diflerence  between  the 
two  simulations;  that  is,  the  hardware  is  not  the  same.  Cal  Tech  used  the  .IPL  Mark  III 
hypercube  whereas  AFIT  used  the  Intel  iPSC/2  hypercube.  With  differing  granularities, 
the  two  machines  will  produce  different  speedup  results  even  for  the  same  software.  During 
the  course  of  this  research,  no  subjective  measures  of  granularity  were  found  to  adequately 
compai'e  the  two  machines. 
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Cal  Tech  reported  in  the  1988  SCS  Multiconference  that  speedup  on  eight  nodes  for 
120  pool  balls  was  approximately  3.2  (3:5).  The  simulation  run  time  ^yas  not  specified 
nor  was  the  sectoring  specified.  There  was  no  mention  concerning  optimal  sectoring  for 
the  single  node  and  8  node  configuration.  For  this  thesis,  a  120  pool  ball  scenario  was 
generated  for  two  seconds  simulation  time.  The  speedup  on  eight  nodes  was  4.97.  The 
graph  is  shown  as  Fig  25.  Cal  Tech  also  published  the  speedup  results  using  160  pool 
balls.  Their  speedup  was  approximately  4.0  on  eight  nodes  using  64  table  sectors.  From 
the  reported  test,  it  appears  that  C^il  Tech  did  not  use  optimum  table  sectoring  to  calculate 
speedup.  They  chose  instead  to  fix  the  sectoring  for  each  of  the  cube  sizes  from  one  to 
32  nodes.  Test  results  using  160  pool  balls  were  reported  using  16,  32  and  64  sectors  for 
which  the  64  sector  table  produced  the  best  of  the  three  speedup  results  for  all  cube  sizes. 
The  AFIT  simulation  produced  speedup  of  5.40  for  160  pool  balls  on  8  nodes. 
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Figure  25.  AFIT  Speedup  Curves  for  120  &  160  Pool  Balls 
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VII.  Conchsions  and  Recommendations 


7. 1  Introduction 

This  chapter  discusses  t}  d  conclusions  which  can  be  made  from  the  results  of  Chap¬ 
ter  VI.  These  conclusions  are  conjectured  to  apply  not  only  to  the  pool  balls  simulation 
but  also  to  any  simulation  which  incorporates  a  two  or  three  dimensional  space  through 
which  the  objects  of  study  move  or  pass.  Examples  include  particle  dynamics  and  battle 
field  simulations. 

7.2  Impact  of  Computational  Load 

Speedup  is  independent  of  the  number  of  objects  to  be  simulated  for  large  N  provided 
that  optimum  sectoring  is  used.  The  empirical  data  suggests  that  the  speedup  stabalizes 
at  approximately  N  =  100 

7.3  Impact  of  Spatial  Partitioning 

Sectoring  the  pool  table  results  in  a  tradeoff  between  decreased  search  space  and 
increased  number  of  events  to  process.  Each  additional  sector  adds  two  incremental  events 
to  process  for  pool  balls  which  must  cross  the  additional  sector.  These  events  are  not 
real  events  of  interest  and  therefore  represent  overhead.  Empirical  data  suggests  that  the 
optimal  number  of  sectors  increases  with  N ,  that  it  is  dependant  upon  the  ratio  of  pool 
balls  to  table  sectors  (density),  and  that  the  optimum  number  of  sectors  for  any  given  size 
of  N  is  independent  of  the  cube  size. 

7.^1  Determining  the  Optimal  Number  of  Sectors 

The  empirical  data  shown  in  figure  2‘1  suggests  that  the  relationship  between  the 
optimum  number  of  sectors  and  the  number  of  pool  balls  is  logarithmic.  Using  linear 
regression  on  an  equation  of  the  form  T  =  wlogA'  +  c,  where  }'  is  the  optimum  number 
of  sectors  and  X  is  the  number  of  pool  balls  to  be  simulated,  and  minimizing  the  error 
of  the  model  results  in  a  =  2.3  and  c  —  -1.5.  The  empirical  data  is  plotted  with  the 
optimum  sectoring  estimate  in  figure  26. 
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Figure  26.  Curve  Fitting  to  the  Optimal  Sector  Data 


7.0  Impact  of  tic  Conservative  Paradigm  upon  Scalability 

This  thesis  developed  two  formulations.  MSTf{u)  and  MSTf{u),  for  calculating  the 
minimum  safe  time.  The  estimator  MSTf{iy)  represents  the  tightest  ui)per  bound  pos¬ 
sible  at  the  cost  of  global  communications.  The  estimator  MSTf{u)  complies  with  the 
constraints  formally  presented  by  Chandy  and  Misra.  This  approach  uses  less  information 
to  produce  a  lower  estimate  of  the  minimum  safe  time:  however,  communications  time  is 
a  constant  limited  only  to  nearest  neighbor  communications.  The  approach  favored  by 
Chandy  and  Misra  can  result  in  an  indeterininant  number  of  iterations  in  which  no  sector 
can  meet  its  minimum  safe  time.  The  additional  information  provided  by  global  communi¬ 
cations  of  the  first  scheme  has  been  proved  to  guarantee  tliat  at  least  one  .sector  can  always 
meet  its  minimum  safe  time.  The  efficiency  curves  of  figure  2  clearly  illustrate  the  cost 
of  global  communications  upon  the  parallel  performance  of  the  pool  balls  simulation.  As 
the  number  of  nodes  increases,  the  efficiency  generally  decreases.  Due  to  the  global  com¬ 
munications,  A/ ST, ‘(//)  is  conjectured  to  yield  superior  speedup  over  the  nearest  neighbor 
scheme  for  small  M  only  where  M  is  at  least  four  nodes.  Due  to  the  indeterminate  number 
of  idle  iterations  wdiich  can  result  with  MST^{i/),  it  is  not  possible  to  predict  the  value  of 
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M  for  which  the  two  estimators  will  produce  equal  speedup  without  empiricaUtest  data. 
The  second  estimator  has  not  yet  been  implemented. 

7.6  Conservative  Versus  Optimistic  Paradigms 

There  are  several  difTerences  between  the  pool  balls  simulation  implemented  by  Cal 
Tech  and  AFIT.  The  two  most  significant  differences  are  the  overall  algorithm  time  com¬ 
plexity  and  the  hardware  used  to  measure  speedup.  Cal  Tech’s  simulation  incorporated  an 
0{N)  search  algorithm  while  AFIT’s  simulation  incorporated  an  O(A^)  algorithm.  Cal 
Tech  used  a  JPL  Mark  III  .32  node  hypercube  while  AFIT  uses  an  Intel  iPSC/2  hypercube. 
There  were  no  quantitative  or  qualitative  measurements  for  hardware  granularity  for  com¬ 
parison.  It  is  reasonable  to  cissume  that  the  measured  speedup  results  for  either  simulation 
design  would  be  different  if  run  on  different  machines;  therefore,  an  accurate  comparison  is 
not  possible.  Lin  and  Lawzowska  performed  an  analytical  study  of  the  two  paradigms  and 
concluded  that  the  optimistic  approach  is  generally  superior  and  that  in  the  worst  case, 
an  optimistic  approach  cannot  lag  arbitrarily  behind  the  conservative  model.  The  speedup 
results  presented  in  this  thesis  are  approximately  35%  higher  on  eight  nodes  than  Cal 
Tech’s  reported  speedup  on  eight  nodes.  While  this  does  not  disprove  Lih  and  Lazowska’s 
work,  it  does  indicate  that  a  conservative  paradigm  applied  to  a  distributed  discrete  event 
simulation  can  produce  significant  speedup.  This  has  important  ramifications  because  an 
optimistic  approach  can  require  vast  amounts  of  memory  to  execute.  Chandy  and  Misra, 
on  the  other  hand,  have  shown  that  their  paradigm  requiies  a  bounded  amovmt  of  memory 
and  that  the  memory  requirements  are  not  more  than  for  a  .sequential  simulation.  This 
thesis  concludes  that  the  conservative  paradigm  has  many  useful  applications  where  the 
optimistic  approach  would  otherwise  exhaust  memory.  If  designed  properly,  a  conservative 
simulation  can  produce  significant  speedup. 

7. 7  Recommendations  for  Future  Study 

The  concept  of  spatially  partitioning  the  pool  table  for  the  pool  balls  simulation 
worked  exceptionally  well.  This  was  due  predominantly  to  the  fact  that  the  pool  balls 
were  uniformly  distributed.  A  battlefield  simulation  is  not  guaranteed  to  have  this  advan¬ 
tage.  Most  of  the  computational  load  for  a  battlefield  simulation  occurs  at  the  boundary 
between  opposing  forces.  To  ensure  that  as  many  sectors  as  possible  have  objects  located 
within  them,  a  dynamically  defined  sectoring  scheme  must  be  implemented.  The  pool  balls 
concept  is  readily  understood  and  is  therefore  easier  to  work  with  than  complex  battlefield 
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simulations.  It  is  recommended  tha.t  a  dynamically  assigned  sectoring  strategy  can  be  cre¬ 
ated  using  the  pool  balls  problem  domain.  The  insights  gained  from  this  endeavor  would 
apply  to  any  non-uniformly  distributed  object  model. 

The  design  of  the  pool  balls  simulation  intentionally  inco.^porated  an  0{N~)  algo¬ 
rithm  to  match  the  design  with  the  granularity  of  the  iPSC/2  hypercube.  An  0(iV)  would 
require  less  overall  execution  time;  therefore,  it  is  recommended  that  the  poohballs  simula¬ 
tion  be  re-designed  to  accommodate  the  0{N)  algorithm.  This  allows  further  comparison 
with  the  pool  balls  simulation  tested  at  Cal  Tech.  If  the  speedup  of  the  0(N)  algorithm 
implemented  using  a  conservative  paradigm  still  outperforms  the  0{N)  algorithm  using 
an  optimistic  paradigm,  then  the  degree  of  confidence  in.  the  asserions  stated  in  this  thesis 
increase. 

-AFIT  has  been  experimenting  with  a  standard  conservative  synchronization  pack¬ 
age  called  SPECTRUM.  This  package  was  originally  written  at  the  University  of  Vir¬ 
ginia  and  incorporates  multiple  filters  for  internodal  communications.  The  main  thrust  of 
SPECTRUM  is  isolate  machine  dependant  software  in  a  low  level  implementation  layer 
of  software  and  to  separate  the  synchronization  protocal  from  the  specific  application.  It 
is  recommended  that  the  pool  balls  software  design  be  modified  to  interface  to  UVA’s 
SPECTRUM  package.  This  would  standardize  the  pool  balls  simulation  with  other,  simu¬ 
lations  produced  at  AFIT  and  would  allow  the  software  to  be  more  easily  ported  to  other 
distributed  processing  systems. 

This  thesis  developed  two  minimum  safe  time  estimators.  The  first  estimator  does 
not  conform  to  Chandy-Misra's  paradigm,  but  ii  is  more  efficient  for  small  cube  sizes.  The 
second  estimator  does  conform  to  Chandy-Misras  paradigm,  but  is  less  efficient  and  more 
scalable.  This  thesis  implemented  the  pool  balls  simulation  using  the  fir.st  estimator  which 
is  more  efficient  but  less  scalable.  It  is  recommended  that  the  pool  balls  simulation  be 
redesigned  with  the  second  estimator  for  detailed  performance  comparison.  This  recom¬ 
mendation  is  based  on  the  premise  that  massive  parallelism  is  desired  by  the  DoD  to  solve 
many  of  its  complex  battle  simulations  and  large  VI IDE  descriptions. 

A  two  dimensional  sectoring  strategy  is  recommended  for  futtire  investigation.  Sev¬ 
eral  large  simulations,  such  as  a  battlefield  simulation,  are  better  suited  to  two  dimensional 
sectoring  due  to  the  distribution  oFobjects.  By  partitioning: the  domain  in  along  two  axes, 
the  objects  can  be  dispersed  with  a  better  chance  of  achieving  superior  load  balancing. 
This  conjecture  can  be  tested  with  the  pool  balls  problem. 
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Appendix  A.  Sofhuare  Listings 


A.l  SofUvare  Files 

The  software  for  the  parallel  implementation  of  the  pool  balls  problem- includes  the 
following  files: 

•  Initialize.c  =>  Main  Host  Code 

•  Simdrive.c  =>  Main  Node  Code 

•  ballADT.c 

©  table.ADT.c 

•  queue.ADT.c 

•  Eventllandler.c 

•  Communicator.c 

•  clock. c 

•  evont.c 

•  neql.c 

•  pool.balls.c 

•  random.c 

•  cube.h 

•  event.h 

•  prolottype.h 

•  structurc.h 

A.l.J  Funclionnl  Descrivlion 

Initialize.c:  This  is  the  main  program  for  the  host.  Its  primary  functions  arc  to 
read  in  the  command  line  arguments  from  the  nser,  create  the  specified  number  of  pool 
balls,  create  the  pool  table  win,  the  specified  table  dimensions,  initialize  all  simulation 
parameters,  place  the  pool  balls  into  the  appropriate  table  sectors,  enforce  VVieland's  data 
replication  for  pool  balls  residing  in  border  regions,  and  communicate  this  information  to 
the  individual  nodc.s.  The  host  program  then  enters  a  continuous  loop  until  receiving  a 
message  from  a  node  indicating  that  the  simulation  is  over  at  which  time  the  host  kills  all 
processes. 
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simdrive.c:  This  is  the  main  program  for  the  nodes.  Its  primary  purpose  is  to 
enforce  the  high  level  loop  construct  for  the  simulation.  The  loop  consists  of: 


•  Determine  the  ne.\t  event 

•  Determine  the  minimum  safe  time 

•  Schedule  the  next  event(s) 

•  Execute  the  scheduled  event(s) 

•  Enforce  Data  Replication 

pool-balls. c:  This  is  the  main  application  code  driver  for  the  nodes.  It  ha.s  an 
initialization  j)rocedure  \vl:ich  is  called  only  once  by  the  main  code,  simdrive.c.  This 
procedure  receives  the  command  line  arguments  sent  from  the  1.  jst,  creates  the  pool  table 
for  each  node,  receives  each  pool  ball  sent  from  the  host  and  inserts  the  pool  balls  into 
the  appropriate  sector,  and  initializes  all  other  packages  which  require  initialization.  It  is 
important  to  note  that  each  node  only  receives  the  pool  balls  which  are  assigned  to  a  sector 
which  resides  on  that  node.  Last,  the  initialization  program  determines  the  fust  event(s) 
to-be  executed. 

The  ijrogram  ‘pooLballs.c’  also  has  a  procedure  which  is  continuously  called  by  the 
main  node  program  which  implements  (via  procedure  calls)  the  funcions  of  executing  an 
event,  determining  the  next  event,  determining  the  minimum  safe  time,  and  scheduling  the 
next  event. 

EventHandler.c:  This  file  perlorms  two  basic  tasks:  it  determines  the  next  event 
possible  for  a  pool  ball  which  is  passed  to  it,. and  it  executes  an  event  message  which  is 
pas.scd  to  it. 

Commiinicator.c:  This  file  performs  all  of  Wielaiurs  data,  replication  strategy. 
Each  time  a  pool  ball  i.s  moved,  this  file  is  invoked.  The  pool  ball  slate  information  which 
is  j)assed  to  the  file  is  inspected.  All  replication  and  Jc-replication  occurs  here. 

ball-ADT.c:  This  file  is  implemented  as  an  abstract  data  type.  It  repre.sents  the 
ball  object  manager  for  each  node.  The  manager  can  perform  the  following  operations  on 
a.  poo!  ball: 


•  Add 
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•  Remove  (without  returning  a  pool  ball) 

•  Get_and_Delete 

•  Get  (without  removing  a  pool  ball) 

Several  other  functions  are  available  but  do  not  change  the  state  of  the  ball  object 
manager.  These  operations  include  printing,  error  checking,  counting,  and  searching. 

clock.c:  Manages  the  simulation  clock. 

event.c:  Dynamically  allocates  and  deallocates  memory  for  events. 

neql.c:  Manages  the  next  event  queue.  Each  node  has  one  NEQ  and  each  NEQ 
stores  scheduled  events.  Each  NEQ  also  has  one  dummy  event  to  ensure  that  the  queue  is 
never  empty.  If  the  dummy  event  is  popped,  it  must  be  re-inserted  before  the  simulation 
can  proceed. 

queue_ADT.c:  This  fde  has  two  basic  functions:  it  creates,  manages  and  updates 
the  candidate  queues,  and  it  enforces  the  synchronization  protocol.  It  currently  enforces 
the  MST^  estimator.  Each  sector  has  one  candidate  qrieue  which  stores  candidate  events. 
After  each  sector  determines  its  minimum  safe  time,  each  candidate  queue  is  inspected  to 
determine  if  its  candidate  event  is  less  than  or  equal  to  its  minimum  safe  time.  If  it  is, 
then  the  candidate  event  is  popped  off  the  queue  and  inserted  into  the  node's  next  event 
queue.  The  event  is  now  said  to  be  scheduled.  If  the  candidate  event  is  greater  than  the 
minimum  safe  time,  then  the  candidate  event  is  removed  and  its  memory  is  freed. 

random. c:  This  file  is  a  machine  indcpendant,  j)seudo-random  number  generator 
based  on  the  works  of  Law  and  Kelton.  It  can  return  random  numbers  with  the  following 
distributions: 


•  uniform 

•  exponential 

•  normal 

•  logarithmic 


table-ADT.c:  This  file  is  implemented  as  an  abstract  data  type.  The  .‘VDT  repre¬ 
sents  the  table  sector  manager.  It  not  only  creates  the  table  and  sectors,  but  it  returns 
information  about  the  table  such  as  the  table  length,  table  width,  and  .sector  coordinates. 
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cube.h:  This  header  file  includes  all  cube  specific  information  as  ‘define’  statements 
in  the  C  programming  language. 

event. h:  This  header  file  defines  the  data  structure  of  an  event  type. 

prototype.!!:  This  header  file  prototypes  all  of  the  files  used  in  the  pool  balls 
simulation. 

structure.!!:  This  header  file  defines  all  global  structures  such  as  the  ball  structure, 
linked  list  structure,  and  sector  structure.  It  also  defines  several  global  variables  such  as 
the  length  of  a  pool  ball  radius,  the  value  ofijr,  the  maximum  allowable  X  and  Y  axis  pool 
ball  velocities  and  some  key  initialization  parameters  such  as  writing  to  dis  and  printing 
to  screen. 

A .  2  Compilmg  I nslructions 

The  makefile  specifies  how  the  various  host  and  node  files  compile.  The  makefile  is 
as  follows: 


host:  Initialize. o  random. o 

cc  -o  host  Initialize. o  random. o 
-Im  -host 


ball.ADT.o  table. ADT.o 
ball.ADT.o  table. ADT.o 


node:  simdrive.o  pool.balls.o  clock. o  neql.o 

event. o  random. o  ball.ADT.o  table. ADT.o 

EventHandler.o  Communicator.©  queue. ADT.o 


cc  -o  node  simdrive.o 
event .o 

EventHandler.o 
-Im  -node 


pool.balls.o  clock.©  neql.o 

random.©  ball.ADT.o  table.ADT.o 

Communicator.©  queue. ADT.o 


Initialize.©: 
cc  -c 


Initialize. c  structure. h  cube.h  prototype. h 
Initialize.© 


simdrive.o: 
cc  -c 


simdrive.c  event. h  cube.h  prototype. h  structure. h 

simdrive.c 


clock. o: 
cc  -c 


clock. c 
clock. c 
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neql .o: 
cc  -c 

event . o : 
cc  -c 

random. o: 
cc  -c 

pool_balls.o: 
cc  -c 

ball.ADT.o: 
cc  -c 

table_ADT.o: 
cc  -c 

EventHandler. 
cc  -c 

quoue.ADT.o: 
cc  -c 

Communicator, 
cc  -c 


neql.c  event. h 
neql . c 

event. c  event. h 
event. c 

random . c 
random . c 

pool_balls.c  structure. h  prototype. h 
pool_balls .c 

ball_ADT.c  structure. h  prototype. h 
ball_ADT.c 

table. ADT.c  structure. h 
table.ADT.c 

:  EventHandler. c  structure. h  event. h  prototype. h 
EventHandler. c 

queue. ADT.c  structure. h  event. h  prototyp.h  cube.h 
queue. ADT.c 

:  Communicator. c  structure. h  event. h  prototype. h  cube.h 
Communicator. c 
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